10,000 Matching Annotations
  1. Oct 2025
    1. Auhtor response:

      Public Reviews:

      Reviewer #1 (Public review):

      The study analyzes the gastric fluid DNA content identified as a potential biomarker for human gastric cancer. However, the study lacks overall logicality, and several key issues require improvement and clarification. In the opinion of this reviewer, some major revisions are needed:

      (1) This manuscript lacks a comparison of gastric cancer patients' stages with PN and N+PD patients, especially T0-T2 patients.

      We are grateful for this astute remark. A comparison of gfDNA concentration among the diagnostic groups indicates a trend of increasing values as the diagnosis progresses toward malignancy. The observed values for the diagnostic groups are as follows:

      Author response table 1.

      The chart below presents the statistical analyses of the same diagnostic/tumor-stage groups (One-Way ANOVA followed by Tukey’s multiple comparison tests). It shows that gastric fluid gfDNA concentrations gradually increase with malignant progression. We observed that the initial tumor stages (T0 to T2) exhibit intermediate gfDNA levels, which in this group is significantly lower than in advanced disease (p = 0.0036), but not statistically different from non-neoplastic disease (p = 0.74).

      Author response image 1.

      (2) The comparison between gastric cancer stages seems only to reveal the difference between T3 patients and early-stage gastric cancer patients, which raises doubts about the authenticity of the previous differences between gastric cancer patients and normal patients, whether it is only due to the higher number of T3 patients.

      We appreciate the attention to detail regarding the numbers analyzed in the manuscript. Importantly, the results are meaningful because the number of subjects in each group is comparable (T0-T2, N = 65; T3, N = 91; T4, N = 63). The mean gastric fluid gfDNA values (ng/µL) increase with disease stage (T0-T2: 15.12; T3-T4: 30.75), and both are higher than the mean gfDNA values observed in non-neoplastic disease (10.81 ng/µL for N+PD and 10.10 ng/µL for PN). These subject numbers in each diagnostic group accurately reflect real-world data from a tertiary cancer center.

      (3) The prognosis evaluation is too simplistic, only considering staging factors, without taking into account other factors such as tumor pathology and the time from onset to tumor detection.

      Histopathological analyses were performed throughout the study not only for the initial diagnosis of tissue biopsies, but also for the classification of Lauren’s subtypes, tumor staging, and the assessment of the presence and extent of immune cell infiltrates. Regarding the time of disease onset, this variable is inherently unknown--by definition--at the time of a diagnostic EGD. While the prognosis definition is indeed straightforward, we believe that a simple, cost-effective, and practical approach is advantageous for patients across diverse clinical settings and is more likely to be effectively integrated into routine EGD practice.

      (4) The comparison between gfDNA and conventional pathological examination methods should be mentioned, reflecting advantages such as accuracy and patient comfort.

      We wish to reinforce that EGD, along with conventional histopathology, remains the gold standard for gastric cancer evaluation. EGD under sedation is routinely performed for diagnosis, and the collection of gastric fluids for gfDNA evaluation does not affect patient comfort. Thus, while gfDNA analysis was evidently not intended as a diagnostic EGD and biopsy replacement, it may provide added prognostic value to this exam.

      (5) There are many questions in the figures and tables. Please match the Title, Figure legends, Footnote, Alphabetic order, etc.

      We are grateful for these comments and apologize for the clerical oversight. All figures, tables, titles and figure legends have now been double-checked.

      (6) The overall logicality of the manuscript is not rigorous enough, with few discussion factors, and cannot represent the conclusions drawn.

      We assume that the unusual wording remark regarding “overall logicality” pertains to the rationale and/or reasoning of this investigational study. Our working hypothesis was that during neoplastic disease progression, tumor cells continuously proliferate and, depending on various factors, attract immune cell infiltrates. Consequently, both tumor cells and immune cells (as well as tumor-derived DNA) are released into the fluids surrounding the tumor at its various locations, including blood, urine, saliva, gastric fluids, and others. Thus, increases in DNA levels within some of these fluids have been documented and are clinically meaningful. The concurrent observation of elevated gastric fluid gfDNA levels and immune cell infiltration supports the hypothesis that increased gfDNA—which may originate not only from tumor cells but also from immune cells—could be associated with better prognosis, as suggested by this study of a large real-world patient cohort.

      In summary, we thank Reviewer #1 for his time and effort in a constructive critique of our work.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated whether the total DNA concentration in gastric fluid (gfDNA), collected via routine esophagogastroduodenoscopy (EGD), could serve as a diagnostic and prognostic biomarker for gastric cancer. In a large patient cohort (initial n=1,056; analyzed n=941), they found that gfDNA levels were significantly higher in gastric cancer patients compared to non-cancer, gastritis, and precancerous lesion groups. Unexpectedly, higher gfDNA concentrations were also significantly associated with better survival prognosis and positively correlated with immune cell infiltration. The authors proposed that gfDNA may reflect both tumor burden and immune activity, potentially serving as a cost-effective and convenient liquid biopsy tool to assist in gastric cancer diagnosis, staging, and follow-up.

      Strengths:

      This study is supported by a robust sample size (n=941) with clear patient classification, enabling reliable statistical analysis. It employs a simple, low-threshold method for measuring total gfDNA, making it suitable for large-scale clinical use. Clinical confounders, including age, sex, BMI, gastric fluid pH, and PPI use, were systematically controlled. The findings demonstrate both diagnostic and prognostic value of gfDNA, as its concentration can help distinguish gastric cancer patients and correlates with tumor progression and survival. Additionally, preliminary mechanistic data reveal a significant association between elevated gfDNA levels and increased immune cell infiltration in tumors (p=0.001).

      Reviewer #2 has conceptually grasped the overall rationale of the study quite well, and we are grateful for their assessment and comprehensive summary of our findings.

      Weaknesses:

      (1) The study has several notable weaknesses. The association between high gfDNA levels and better survival contradicts conventional expectations and raises concerns about the biological interpretation of the findings.

      We agree that this would be the case if the gfDNA was derived solely from tumor cells. However, the findings presented here suggest that a fraction of this DNA would be indeed derived from infiltrating immune cells. The precise determination of the origin of this increased gfDNA remains to be achieved in future follow-up studies, and these are planned to be evaluated soon, by applying DNA- and RNA-sequencing methodologies and deconvolution analyses.

      (2) The diagnostic performance of gfDNA alone was only moderate, and the study did not explore potential improvements through combination with established biomarkers. Methodological limitations include a lack of control for pre-analytical variables, the absence of longitudinal data, and imbalanced group sizes, which may affect the robustness and generalizability of the results.

      Reviewer #2 is correct that this investigational study was not designed to assess the diagnostic potential of gfDNA. Instead, its primary contribution is to provide useful prognostic information. In this regard, we have not yet explored combining gfDNA with other clinically well-established diagnostic biomarkers. We do acknowledge this current limitation as a logical follow-up that must be investigated in the near future.

      Moreover, we collected a substantial number of pre-analytical variables within the limitations of a study involving over 1,000 subjects. Longitudinal samples and data were not analyzed here, as our aim was to evaluate prognostic value at diagnosis. Although the groups are imbalanced, this accurately reflects the real-world population of a large endoscopy center within a dedicated cancer facility. Subjects were invited to participate and enter the study before sedation for the diagnostic EGD procedure; thus, samples were collected prospectively from all consenting individuals.

      Finally, to maintain a large, unbiased cohort, we did not attempt to balance the groups, allowing analysis of samples and data from all patients with compatible diagnoses (please see Results: Patient groups and diagnoses).

      (3) Additionally, key methodological details were insufficiently reported, and the ROC analysis lacked comprehensive performance metrics, limiting the study's clinical applicability.

      We are grateful for this useful suggestion. In the current version, each ROC curve (Supplementary Figures 1A and 1B) now includes the top 10 gfDNA thresholds, along with their corresponding sensitivity and specificity values (please see Suppl. Table 1). The thresholds are ordered from-best-to-worst based on the classic Youden’s J statistic, as follows:

      Youden Index = specificity + sensitivity – 1 [Youden WJ. Index for rating diagnostic tests. Cancer 3:32-35, 1950. PMID: 15405679]. We have made an effort to provide all the key methodological details requested, but we would be glad to add further information upon specific request.

    1. Author response:

      Reviewer 1:

      Summary:

      Identifying drugs that target specific disease phenotypes remains a persistent challenge. Many current methods are only applicable to well-characterized small molecules, such as those with known structures. In contrast, methods based on transcriptional responses offer broader applicability because they do not require prior information about small molecules. Additionally, they can be rapidly applied to new small molecules. One of the most promising strategies involves the use of “drug response signatures”-specific sets of genes whose differential expression can serve as markers for the response to a small molecule. By comparing drug response signatures with expression profiles characteristic of a disease, it is possible to identify drugs that modulate the disease profile, indicating a potential therapeutic connection.

      This study aims to prioritize potential drug candidates and to forecast novel drug combinations that may be effective in treating triple-negative breast cancer (TNBC). Large consortia, such as the LINCS-L1000 project, offer transcriptional signatures across various time points after exposing numerous cell lines to hundreds of compounds at different concentrations. While this data is highly valuable, its direct applicability to pathophysiological contexts is constrained by the challenges in extracting consistent drug response profiles from these extensive datasets. The authors use their method to create drug response profiles for three different TNBC cell lines from LINCS.

      To create a more precise, cancer-specific disease profile, the authors highlight the use of single-cell RNA sequencing (scRNA-seq) data. They focus on TNBC epithelial cells collected from 26 diseased individuals compared to epithelial cells collected from 10 healthy volunteers. The authors are further leveraging drug response data to develop inhibitor combinations.

      Strengths:

      The authors of this study contribute to an ongoing effort to develop automated, robust approaches that leverage gene expression similarities across various cell lines and different treatment regimens, aiming to predict drug response signatures more accurately. The authors are trying to address the gap that remains in computational methods for inferring drug responses at the cell subpopulation level.

      Weaknesses:

      One weakness is that the authors do not compare their method to previous studies. The authors develop a drug response profile by summarizing the time points, concentrations, and cell lines. The computational challenge of creating a single gene list that represents the transcriptional response to a drug across different cell lines and treatment protocols has been previously addressed. The Prototype Ranked List (PRL) procedure, developed by Iorio and co-authors (PNAS, 2010, doi:10.1073/pnas.1000138107), uses a hierarchical majority-voting scheme to rank genes. This method generates a list of genes that are consistently overexpressed or downregulated across individual conditions, which then hold top positions in the PRL. The PRL methodology was used by Aissa and co-authors (Nature Comm 2021, doi:10.1038/s41467-021-21884-z) to analyze drug effects on selective cell populations using scRNA-seq datasets. They combined PRL with Gene Set Enrichment Analysis (GSEA), a method that compares a ranked list of genes like PRL against a specific set of genes of interest. GSEA calculates a Normalized Enrichment Score (NES), which indicates how well the genes of interest are represented among the top genes in the PRL. Compared to the method described in the current manuscript, the PRL method allows for the identification of both upregulated and downregulated transcriptional signatures relevant to the drug’s effects. It also gives equal weight to each cell line’s contribution to the drug’s overall response signature.

      The authors performed experimental validation of the top two identified drugs; however, the effect was modest. In addition, the effect on TNBC cell lines was cell-line specific as the identified drugs were effective against BT20, whose transcriptional signatures from LINCS were used for drug identification, but not against the other two cell lines analyzed. An incorrect choice of genes for the signature may result in capturing similarities tied to experimental conditions (e.g., the same cell line) rather than the drug’s actual effects. This reflects the challenges faced by drug response signature methods in both selecting the appropriate subset of genes that make up the signature and managing the multiple expression profiles generated by treating different cell lines with the same drug.

      We appreciate the reviewer’s thoughtful feedback and their suggestion to refer to the Prototype Ranked List (PRL) manuscript. Unfortunately, since this methodology for the PRL isn’t implemented in an open-source package, direct comparison with our approach is challenging. Nonetheless, we investigated whether using ranks would yield similar results for the most likely active drug pairs identified by retriever. To do this, we calculated and compared the rankings of the average effect sizes provided by retriever. Although the Spearman (ρ \= 0.98) correlation coefficient was high, we observed that key genes are disadvantaged when using ranks compared to effect sizes. This difference is particularly evident in the gene set enrichment analysis, where using average ranks identified only one pathway as statistically significantly enriched. The code to replicate these analyses is available at https://github.com/dosorio/L1000-TNBC/blob/main/Code/.

      Author response image 1.

      Given the similarity in purpose between retriever and the PRL approach, we have added the following statement to the introduction: “Previously, this goal was approached using a majority-voting scheme to rank genes across various cell types, concentrations, and time points. This approach generates a prototype ranked list (PRL) that represents the consistent ranks of genes across several cell lines in response to a specific drug.”

      Regarding the experimental validation, we believe there is a misunderstanding about the evidence we provided. We would like to claridy that we used three different TNBC cell lines: CAL120, BT20, and DU4475. It’s important to note that CAL120 and DU4475 were not included in the signature generation process. Despite this, we observed effects that exceeded the additive effects expectations, particularly in the CAL120 cell line (Figure 5, Panel F).

      Reviewer 2:

      Summary:

      In their study, Osorio and colleagues present ‘retriever,’ an innovative computational tool designed to extract disease-specific transcriptional drug response profiles from the LINCS-L1000 project. This tool has been effectively applied to TNBC, leveraging single-cell RNA sequencing data to predict drug combinations that may effectively target the disease. The public review highlights the significant integration of extensive pharmacological data with high-resolution transcriptomic information, which enhances the potential for personalized therapeutic applications.

      Strengths:

      A key finding of the study is the prediction and validation of the drug combination QL-XII-47 and GSK-690693 for the treatment of TNBC. The methodology employed is robust, with a clear pathway from data analysis to experimental confirmation.

      Weaknesses:

      However, several issues need to be addressed. The predictive accuracy of ’retriever’ is contingent upon the quality and comprehensiveness of the LINCS-L1000 and single-cell datasets utilized, which is an important caveat as these datasets may not fully capture the heterogeneity of patient responses to treatment. While the in vitro validation of the drug combinations is promising, further in vivo studies and clinical trials are necessary to establish their efficacy and safety. The applicability of these findings to other cancer types also warrants additional investigation. Expanding the application of ’retriever’ to a broader range of cancer types and integrating it with clinical data will be crucial for realizing its potential in personalized medicine. Furthermore, as the study primarily focuses on kinase inhibitors, it remains to be seen how well these findings translate to other drug classes.

      We thank the reviewer for their thoughtful and constructive feedback. We appreciate your insights and agree that several important considerations need to be addressed.

      We recognize that the predictive accuracy of retriever depends on the LINCS-L1000 and single-cell datasets. These resources may not fully represent the complete range of transcriptional responses to disease and treatment across different patients. As you mentioned, this is an important limitation. However, we believe that by extrapolating the evaluation of the most likely active compound to each individual patient, we can help address this issue. This approach will provide valuable insights into which patients in the study are most likely to respond positively to treatment.

      On the in-vitro validation of drug combinations, we agree that while promising, these results are not sufficient on their own to establish clinical efficacy. Additional in-vivo studies will be essential in assessing the therapeutic potential and safety of these combinations, and clinical trials will be an important next step to validate the translational impact of our findings.

      Lastly, we appreciate the reviewer’s comment about the focus of our study on kinase inhibitors. This result was unexpected, as we tested the full set of compounds from the LINCS-L1000 project. We agree that exploring other top candidates, including different drug classes, will be important for assessing how broadly retriever approach can be applied.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Pradhan et al investigated the potential gustatory mechanisms that allow flies to detect cholesterol. They found that flies are indifferent to low cholesterol and avoid high cholesterol. They further showed that the ionotropic receptors Ir7g, Ir51b, and Ir56d are important for the cholesterol sensitivity in bitter neurons. The figures are clear and the behavior result is interesting. However, I have several major comments, especially on the discrepancy of the expression of these Irs with other lab published results, and the confusing finding that the same receptors (Ir7g, Ir51b) have been implicated in the detection of various seemingly unrelated compounds.

      Strengths:

      The results are very well presented, the figures are clear and well-made, text is easy to follow.

      Weaknesses:

      (1) Regarding the expression of Ir56d. The reported Ir56d expression pattern contradicts multiple previous studies (Brown et al., 2021 eLife, Figure 6a-c; Sanchez-Alcaniz et al., 2017 Nature Communications, Figure 4e-h; Koh et al., 2014 Neuron, Figure 3b). These studies, using three different driver lines, consistently showed Ir56d expression in sweet-sensing neurons and taste peg neurons. Importantly, Sanchez-Alcaniz et al. demonstrated that Ir56d is not expressed in Gr66a-expressing (bitter) neurons. This discrepancy is critical since Ir56d is identified as the key subunit for cholesterol detection in bitter neurons, and misexpression of Ir7g and Ir51b together is insufficient to confer cholesterol sensitivity (Fig.4b,d). Which Ir56d-GAL4 (and Gr66a-I-GFP) line was used in this study? Is there additional evidence (scRNA sequencing, in-situ hybridization, or immunostaining) supporting Ir56d expression in bitter neurons?

      We agree that the expression pattern of Ir56d diverges from two prior reports . The studies by Brown et al. and Koh et al. employed the same Ir56d-GAL4 driver line, which exhibited expression in sweet-sensing gustatory receptor neurons (GRNs) and taste peg neurons, but not bitter GRNs (the Sanchez-Alcaniz et al. paper did not use an Ir56d-Gal4).

      In our study, we used a Ir56d-GAL4 driver line (KDRC:2307) and the Gr66a-I-GFP reporter line (Weiss et al., 2011 Neuron). This is a crucial distinction, as differences in the regulatory regions used to generate different driver lines are well known to underlie differences in expression patterns. Our double-labeling experiments revealed co-expression of Ir56d with Gr66a-positive bitter GRNs specifically within the S6 and S7 sensilla—types previously shown to exhibit strong electrophysiological responses to cholesterol (Figure 2—figure supplement 1F).

      We believe this observation is biologically significant and consistent with our functional data. Specifically, targeted expression of Ir56d in bitter neurons using the Gr33a-GAL4 was sufficient to rescue cholesterol avoidance behavior in Ir56d<sup>1</sup> mutants (Figure 3G). These results demonstrate that Ir56d plays a functional role in bitter GRNs for cholesterol detection. The convergence of genetic, behavioral, and electrophysiological data presented in our study provides compelling support for this previously unappreciated expression pattern and function of Ir56d.

      (2) Ir51b has previously been implicated in detecting nitrogenous waste (Dhakal 2021), lactic acid (Pradhan 2024), and amino acids (Aryal 2022), all by the same lab. Additionally, both Ir7g and Ir51b have been implicated in detecting cantharidin, an insect-secreted compound that flies may or may not encounter in the wild, by the same lab. Is Ir51b proposed to be a specific receptor for these chemically distinct compounds or a general multimodal receptor for aversive stimuli? Unlike other multimodal bitter receptors, the expression level of Ir51b is rather low and it's unclear which subset of GRNs express this receptor. The chemical diversity among nitrogenous waste, amino acids, lactic acid, cantharidin, and cholesterol raises questions about the specificity of these receptors and warrants further investigation and at a minimum discussion in this paper. Given the wide and seemingly unrelated sensitivity of Ir51b and Ir7g to these compounds I'm leaning towards the hypothesis that at least some of these is non-specific and ecologically irrelevant without further supporting evidence from the authors.

      While it is true that IR51b and IR7g are responsive to a range of compounds, they share chemical features such as nitrogen-containing groups, hydrophobicity, or amphipathic structures suggesting that recognition of these chemicals may be mediated by the same or overlapping domains within the receptor complexes. These features could facilitate binding to a structurally diverse yet chemically related groups of aversive ligands.

      In the case of cholesterol, while its sterol ring system is distinct from the other compounds, it shares hydrophobic and amphipathic properties that may enable interaction with these receptors via similar structural motifs. Importantly, our data demonstrates that Ir51b and Ir7g are necessary but not sufficient on their own to confer cholesterol sensitivity, indicating that additional co-factors or receptor subunits are required for full functionality (Figure 4B, D). Furthermore, our dose-response analysis (Figure 3F) shows that Ir7g is particularly important at higher cholesterol concentrations, supporting the idea of graded sensitivity rather than indiscriminate activation. This suggests that these receptors may have evolved to recognize cholesterol and its analogs (e.g., phytosterols such as stigmasterol, yet to be tested), which are naturally found in the fly’s diet (e.g., yeast and plant-derived matter), as ecologically relevant cues signaling microbial contamination, lipid imbalance, or dietary overconsumption.

      We acknowledge the reviewer’s concern regarding the relatively low expression levels of Ir51b and Ir7g. However, we note that low transcript abundance does not necessarily equate to diminished physiological relevance. Finally, we agree that the chemical diversity of ligands associated with Ir51b and Ir7g warrants deeper investigation, particularly through structure-function studies aimed at identifying ligand-binding domains and receptor-ligand interactions at atomic resolution.

      (3) The Benton lab Ir7g-GAL4 reporter shows no expression in adults. Additionally, two independent labellar RNA sequencing studies (Dweck, 2021 eLife; Bontonou et al., 2024 Nature Communications) failed to detect Ir7g expression in the labellum. This contradicts the authors' previous RT-PCR results (Pradhan 2024 Fig. S4, Journal of Hazardous Materials) showing Ir7g expression in the labellum. Additionally the Benton and Carlson lab Ir51b-GAL4 reporters show no expression in adults as well. Please address these inconsistencies.

      With respect to Ir7g, we acknowledge that the Ir7g-GAL4 reporter line from the Benton lab does not exhibit detectable expression in adult labella. Furthermore, two independent transcriptomic studies—Dweck et al., 2021 (eLife) and Bontonou et al., 2024 (Nature Communications) also did not detect Ir7g transcripts in bulk RNA-seq datasets derived from adult labella. However, our previously published RT-PCR data (Pradhan et al., 2024, Journal of Hazardous Materials, Fig. S4) revealed Ir7g expression in labellar tissue, albeit at low levels. Our RT-PCR includes an internal control (tubulin) with the same reaction tube with control and the Ir7g mutant as a negative control. Therefore, we stand behind the findings that Ir7g is expressed in the labellum.

      We would like to point out that RT-PCR is more sensitive and better-suited to detect low-abundance transcripts than bulk RNA-seq, which may fail to capture transcripts due to limitations in depth of coverage. Moreover, immunohistochemistry can have limitations in detecting very low expression levels. Costa et al. 2013 (Translational lung cancer research) states that “RNA-Seq technique will not likely replace current RT-PCR methods, but will be complementary depending on the needs and the resources as the results of the RNA-Seq will identify those genes that need to then be examined using RT-PCR methods”.

      Similarly, regarding Ir51b, while the GAL4 reporter lines from the Benton and Carlson labs do not show robust adult expression, our RT-PCR and functional data strongly support a role for Ir51b in labellar bitter GRNs. Specifically, Ir51b<sup>1</sup> mutants display electrophysiological deficits in response to cholesterol (Figure 2A–B), and these defects are rescued by expressing Ir51b in Gr33a-positive bitter neurons (Figure 3G), providing functional validation of the RT-PCR expression.

      (4) The premise that high cholesterol intake is harmful to flies, which makes sensory mechanisms for cholesterol avoidance necessary, is interesting but underdeveloped. Animal sensory systems typically evolve to detect ecologically relevant stimuli with dynamic ranges matching environmental conditions. Given that Drosophila primarily consume fruits and plant matter (which contain minimal cholesterol) rather than animal-derived foods (which contain higher cholesterol), the ecological relevance of cholesterol detection requires more thorough discussion. Furthermore, at high concentrations, chemicals often activate multiple receptors beyond those specifically evolved for their detection. If the cholesterol concentrations used in this study substantially exceed those encountered in the fly's natural diet, the observed responses may represent an epiphenomenon rather than an ecologically and ethologically relevant sensory mechanism. What is the cholesterol content in flies' diet and how does that compare to the concentrations used in this paper?

      Drosophila melanogaster cannot synthesize sterols de novo, and must acquire them from its diet. In natural environments, flies acquire sterols from fermenting fruit, decaying plant matter, and yeast, which contain trace amounts of phytosterols (e.g., stigmasterol, β-sitosterol) and ergosterol. While the exact sterol concentrations in these sources remain uncharacterized, our behavioral assays used concentrations (0.001–0.01% by weight) that align with the low levels expected in such nutrient-limited ecological niches.

      In our study, the cholesterol concentrations tested ranged from 0.001% to 0.1%, thereby spanning both the physiologically relevant and slightly elevated range. Importantly, avoidance behaviors and receptor activation were most prominent at 0.1% cholesterol. While it is true that high chemical concentrations may elicit off-target effects via broad receptor activation, our genetic and electrophysiological data indicate that the observed responses are mediated by specific ionotropic receptors (Ir51b, Ir7g, Ir56d) and not merely generalized chemical stress.

      Ecologically, elevated sterol levels may also signal conditions unsuitable for egg-laying or larval development. For example, high levels of cholesterol or other sterols may occur in substrates colonized by pathogenic microbes, decaying animal tissue, or in cases of abnormal microbial fermentation, which could represent a nutritional or microbial hazard. The avoidance of cholesterol may help signal the flies to avoid consuming decaying animal tissue. In this context, sensory detection of excessive cholesterol might serve as a protective function.

      Reviewer #2 (Public review):

      Summary:

      In Cholesterol Taste Avoidance in Drosophila melanogaster, Pradhan et al. used behavioral and electrophysiological assays to demonstrate that flies can: (1) detect cholesterol through a subset of bitter-sensing gustatory receptor neurons (GRNs) and (2) avoid consuming food with high cholesterol levels. Mechanistically, they identified five members of the IR family as necessary for cholesterol detection in GRNs and for the corresponding avoidance behavior. Ectopic expression experiments further suggested that Ir7g + Ir56d or Ir51b + Ir56d may function as tuning receptors for cholesterol detection, together with the Ir25a and Ir76b co-receptors.

      Strengths:

      The experimental design of this study was logical and straightforward. Leveraging their expertise in the Drosophila taste system, the research team identified the molecular and cellular basis of a previously unrecognized taste category, expanding our understanding of gustation. A key strength of the study was its combination of electrophysiological recordings with behavioral genetic experiments.

      Weaknesses:

      My primary concern with this study is the lack of a systematic survey of the IRs of interest in the labellum GRNs. Consequently, there is no direct evidence linking the expression of putative cholesterol IRs to the B GRNs in the S6 and S7 sensilla.

      Specifically, the authors need to demonstrate that the IR expression pattern explains cholesterol sensitivity in the B GRNs of S6 and S7 sensilla, but not in other sensilla. Instead of providing direct IR expression data for all candidate IRs (as shown for Ir56d in Figure 2-figure supplement 1F), the authors rely on citations from several studies (Lee, Poudel et al. 2018; Dhakal, Sang et al. 2021; Pradhan, Shrestha et al. 2024) to support their claim that Ir7g, Ir25a, Ir51b, and Ir76b are expressed in B GRNs (Lines 192-194). However, none of these studies provide GAL4 expression or in situ hybridization data to substantiate this claim.

      Without a comprehensive IR expression profile for GRNs across all taste sensilla, it is difficult to interpret the ectopic expression results observed in the B GRN of the I9 sensillum or the A GRN of the L-sensillum (Figure 4). It remains equally plausible that other tuning IRs-beyond the co-receptor Ir25a and Ir76b-could interact with the ectopically expressed IRs to confer cholesterol sensitivity, rather than the proposed Ir7g + Ir56d or Ir51b + Ir56d combinations.

      We provide electrophysiological data demonstrating that the S6 and S7 sensilla respond to cholesterol (Figure 1D). This finding is consistent with the hypothesis that these sensilla harbor the complete receptor complexes necessary for cholesterol detection. In our electrophysiological recordings, only those bitter GRNs that co-express Ir56d along with either Ir7g or Ir51b generate action potentials in response to cholesterol. Other S-type sensilla lacking one or more of these subunits remain unresponsive, reinforcing the idea that these components are necessary for receptor function and sensory coding of cholesterol. Moreover, in the cholesterol-insensitive I9 sensillum (based on our mapping results using electrophysiology), co-expression of either Ir7g + Ir56d or Ir51b + Ir56d conferred de novo cholesterol sensitivity (Figure 4B). Importantly, no cholesterol response was observed when any of these IRs was expressed alone or when Ir7g + Ir51b were co-expressed without Ir56d. These findings strongly argue against the possibility that endogenous tuning IRs in I9 sensilla (e.g., Ir25a, Ir76b) are sufficient to generate cholesterol responsiveness.

      Furthermore, based on the literature, Ir25a and Ir76b are endogenously expressed in I- and L-type sensilla. Thus, their presence alone is insufficient for cholesterol responsiveness. These data support the model that cholesterol sensitivity depends on a specific, multi-subunit receptor complex (e.g., Ir7g + Ir25a + Ir56d + Ir76b or Ir51b + Ir25a + Ir56d + Ir76b).

      In conclusion, while we acknowledge that our data do not provide a full anatomical map of IR expression across all sensilla, our results strongly support the idea that cholesterol sensitivity in S6 and S7 sensilla arises from specific combinations of IRs expressed in the B GRNs.

      Reviewer #3 (Public review):

      Summary:

      Whether and how animals can taste cholesterol is not well understood. The study provides evidence that 1) cholesterol activates a subset of bitter-sensing gustatory receptor neurons (GRNs) in the fly labellum, but not other types of GRNs, 2) flies show aversion to high concentrations of cholesterol, and this is mediated by bitter GRNs, and 3) cholesterol avoidance depends on a specific set of ionotropic receptor (IR) subunits acting in bitter GRNs. The claims of the study are supported by electrophysiological recordings, genetic manipulations, and behavioral readouts.

      Strengths:

      Cholesterol taste has not been well studied, and the paper provides new insight into this question. The authors took a comprehensive and rigorous approach in several different parts of the paper, including screening the responses of all 31 labellar sensilla, screening a large panel of receptor mutants, and performing misexpression experiments with nearly every combination of the 5 IRs identified. The effects of the genetic manipulations are very clear and the results of electrophysiological and behavioral studies match nicely, for the most part. The appropriate controls are performed for all genetic manipulations.

      Weaknesses:

      The weaknesses of the study, described below, are relatively minor and do not detract from the main conclusions of the paper.

      (1) The paper does not state what concentrations of cholesterol are present in Drosophila's natural food sources. Are the authors testing concentrations that are ethologically Drosophila melanogaster primarily feeds on fermenting fruits and associated microbial communities, especially yeast, which serve as major sources of dietary sterols. These natural food sources are known to contain phytosterols such as stigmasterol and β-sitosterol. One study quantified phytosterols (e.g., stigmasterol, sitosterol) in fruits, reporting concentrations between 1.6–32.6 mg/100 g edible portion (~0.0016–0.0326% wet weight) (Han et al 2008). The range we tested falls within this range. Additionally, ergosterol, the principal sterol in yeast and a structural analog of cholesterol, is present at levels of about 0.005% to 0.02% in yeast-rich environments.

      To ensure physiological relevance, we designed our behavioral assays to include a broad concentration range of cholesterol, from 10<sup>-5</sup>% to 10<sup>-1</sup>%. This spans both physiological levels (0.001–0.01%), which are comparable to those found in the natural diet, and supra-physiological levels (e.g., 0.1%), which exceed natural exposure but help define the threshold for aversive behavior.

      Our results demonstrate that flies begin to avoid cholesterol at concentrations ≥10<sup>-3</sup>% more (Figure 3A), which falls within the upper physiological range and may reflect the threshold beyond which cholesterol or related sterols become deleterious. At these higher concentrations, excess sterols may disrupt membrane fluidity, interfere with hormone signaling, or promote microbial overgrowth—all of which could compromise fly health.

      (2) The paper does not state or show whether the expression of IR7g, IR51b, and IR56d is confined to bitter GRNs. Bitter-specific expression of at least some of these receptors would be necessary to explain why bitter GRNs but not sugar GRNs (or other GRN types) normally show cholesterol responses.

      We show the Ir56d-Gal4 is co-expressed with Gr66a-GFP in S6/S7 sensilla, indicating that it is expressed in bitter GRNs (Figure 2—figure supplement 1F). In the case of Ir7g and Ir51b, there are no reporters or antibodies to address expression. However, previously they have been shown to be expressed in bitter GRNs using RT-PCR (Dhakal et al. 2021, Communications Biology; Pradhan et al. 2024, Journal of Hazardous Materials). In addition, we provide functional evidence that bitter GRNs are required for the cholesterol response since silencing bitter GRNs abolishes cholesterol-induced action potentials (Figure 1E–F). Moreover, we showed that we could rescue the Ir7g<sup>1</sup>, Ir51b<sup>1</sup> and Ir56d<sup>1</sup> mutant phenotypes only when we expressed the cognate transgenes in bitter GRNs using the Gr33a-GAL4 (Figure 3G). Thus, while Ir7g/Ir51b are not exclusive to bitter GRNs, their functional role in cholesterol detection is bitter-GRN-specific.

      (3) The authors only investigated the responses of GRNs in the labellum, but GRN responses in the leg may also contribute to the avoidance of cholesterol feeding. Alternatively, leg GRNs might contribute to cholesterol attraction that is unmasked when bitter GRNs are silenced. In support of this possibility, Ahn et al. (2017) showed that Ir56d functions in sugar GRNs of the leg to promote appetitive responses to fatty acids.

      This is an interesting idea. Indeed, when bitter GRNs are hyperpolarized, the flies exhibit a strong attraction to cholesterol. Nevertheless, the cellular basis for cholesterol attraction and whether it is mediated by GRNs in the legs will require a future investigation.

      (4) The authors might consider using proboscis extension as an additional readout of taste attraction or aversion, which would help them more directly link the labellar GRN responses to a behavioral readout. Using food ingestion as a readout can conflate the contribution of taste with post-ingestive effects, and the regulation of food ingestion also may involve contributions from GRNs on multiple organs, whereas organ-specific contributions can be dissociated using proboscis extension. For example, does presenting cholesterol on the proboscis lead to aversive responses in the proboscis extension assay (e.g., suppression of responses to sugar)? Does this aversion switch to attraction when bitter GRNs are silenced, as with the feeding assay?

      We thank the reviewer for the suggestion regarding the use of the proboscis extension reflex (PER) assay to strengthen the link between labellar GRN activity and behavioral responses to cholesterol.

      Author response image 1.

      Our PER assay results shown above indicate that cholesterol presentation on the labellum or forelegs leads to an aversive response, as evidenced by a significant reduction in proboscis extension when compared to control stimuli (Author response image 1A. 2% sucrose or 2% sucrose with 10<sup>-1</sup>% cholesterol was applied to labellum or forelegs and the percent PER was recorded. n=6. Data were compared using single-factor ANOVA coupled with Scheffe’s post-hoc test. Statistical significance was compared with the control. Means ± SEMs. **p<0.01). This finding supports the idea that cholesterol is detected by labellar and leg GRNs and elicits behavioral avoidance. In contrast, sucrose stimulation robustly induces proboscis extension, as expected for an appetitive stimulus. We confirmed the defects of due to each Ir mutant by presenting the stimuli to the labellum (Author response image 1B). Together, these PER results provide a more direct behavioral correlate of labellar and leg GRN activation and reinforce our conclusion that cholesterol is sensed as an aversive tastant through the labellar bitter GRNs.

      (5) The authors claim that the cholesterol receptor is composed of IR25a, IR76b, IR56d, and either IR7g or IR51b. While the authors have shown that IR25a and IR76b are each required for cholesterol sensing, they did not show that both are required components of the same receptor complex. If the authors are relying on previous studies to make this assumption, they should state this more clearly. Otherwise, I think further misexpression experiments may be needed where only IR25a or IR76b, but not both, are expressed in GRNs.

      In our study, we relied on prior work demonstrating that Ir25a and Ir76b function as broadly required co-receptors in most IR-dependent chemosensory pathways (Ganguly et al., 2017; Lee et al., 2018). These studies showed that Ir25a and Ir76b are co-expressed in many GRNs across multiple taste modalities. Functional IR complexes often fail to form or signal properly in the absence of these co-receptors. Thus, it is widely accepted in the field that Ir25a and Ir76b function together as a core heteromeric scaffold for diverse IR complexes, akin to co-receptors in other ionotropic glutamate receptor families. We state that while Ir25a and Ir76b are presumed co-receptors in the cholesterol receptor complex based on their conserved roles, their direct physical interaction with Ir7g, Ir51b, and Ir56d remains to be demonstrated.

      In support of this model, we note that in our ectopic expression experiments using I9 sensilla, which endogenously express Ir25a and Ir76b, introduction of either Ir7g + Ir56d or Ir51b + Ir56d was sufficient to confer cholesterol sensitivity (Figure 4B). We obtained a similar result in L6 sensilla (Figure 4D), which also endogenously express Ir25a and Ir76b. These findings imply that both co-receptors are already present in these sensilla and are likely part of the functional complex. However, we agree that we have not directly tested the requirement for both co-receptors in a minimal reconstitution context, such as expressing only Ir25a or Ir76b alongside tuning IRs in an otherwise null background. Such an experiment would indeed provide more direct evidence of their joint requirement in the receptor complex. Future studies, including heterologous expression experiments, will be necessary to define the cholesterol-receptor complexes.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors introduce a computational model that simulates the dendrites of developing neurons in a 2D plane, subject to constraints inspired by known biological mechanisms such as diffusing trophic factors, trafficked resources, and an activity-dependent pruning rule. The resulting arbors are analyzed in terms of their structure, dynamics, and responses to certain manipulations. The authors conclude that 1) their model recapitulates a stereotyped timecourse of neuronal development: outgrowth, overshoot, and pruning 2) Neurons achieve near-optimal wiring lengths, and Such models can be useful to test proposed biological mechanisms- for example, to ask whether a given set of growth rules can explain a given observed phenomenon - as developmental neuroscientists are working to understand the factors that give rise to the intricate structures and functions of the many cell types of our nervous system.

      Overall, my reaction to this work is that this is just one instantiation of many models that the author could have built, given their stated goals. Would other models behave similarly? This question is not well explored, and as a result, claims about interpreting these models and using them to make experimental predictions should be taken warily. I give more detailed and specific comments below.

      We thank the reviewer for the summary of the work. We find the criticism “that this is one instantiation of many models [we] could have built” can apply to any model. To quote George Box, “all models are wrong, but some models are useful” was the moto that drove our modeling approach. In principle, there are infinitely many possible models. We chose one of the most minimalistic models which implements known biological mechanisms including activity-independent and -dependent phases of dendritic growth, and constrained parameters based on experimental data. We compare the proposed model to other alternatives in the Discussion section, especially to the models of Hermann Cuntz which propose very different strategies for growth.

      However, the reviewer is right that within the type of model we chose, we could have more extensively explored the sensitivity to parameters. In the revised manuscript we will investigate the sensitivity of model output to variations of specific parameters, as explained below.

      Point 1.1. Line 109. After reading the rest of the manuscript, I worry about the conclusion voiced here, which implies that the model will extrapolate well to manipulations of all the model components. How were the values of model parameters selected? The text implies that these were selected to be biologically plausible, but many seem far off. The density of potential synapses, for example, seems very low in the simulations compared to the density of axons/boutons in the cortex; what constitutes a potential synapse? The perfect correlations between synapses in the activity groups is flawed, even for synapses belonging to the same presynaptic cell. The density of postsynaptic cells is also orders of magnitude of, etc. Ideally, every claim made about the model's output should be supported by a parameter sensitivity study. The authors performed few explorations of parameter sensitivity and many of the choices made seem ad hoc.

      It is indeed important to clarify how the model parameters were selected. Here we provide a short justification for some of these parameters, which will be included in the revised manuscript.

      1) Potential synapse density: We modelled 1,500 potential synapses in a cortical sheet of size 185x185 microns squared. We used 1 pixel per μm to capture approximately 1 μm thick dendrites. Therefore, we started with initial density of 0.044 potential synapses per μm^2. From Author Response Image 1 we can see that at the end of our simulation time ~1,000 potential synapses remain. So in fact, the density of potential synapses is totally sufficient, since not many potential synapses end up connected. The rapid slowing down of growth in our model is not due to a depletion of potential synaptic partners as the number of potential synapses remains high. Nonetheless, we will explore this in the revised manuscript. (this figure will be included in the revised submission):

      2) Stabilized synapse density: Since ~1,000 of the potential synapses in the modeled cortical sheet remain available, ~500 become connected to the dendrites of the 9 somas in the modeled cortical sheet. This means that the density of stable connected synapses is approximately 0.015 synapses per μm^2. This is also the number that is shown in Figure 3b, which is about 60 synapses stabilized per cell. This density is much easier to compare to experimental data, and below we provide some numbers from literature we already cited in the manuscript as well as a recent preprint.

      In the developing cortex:

      • Leighton, Cheyne and Lohmann 2023 https://doi.org/10.1101/2023.03.02.530772 find up to 0.4 synapses per μm in pyramidal neurons in vivo in the developing mouse visual cortex at P8 to P13. This is almost identical to our value of 0.4 synapses per μm.

      • Ultanir et al., 2007 https://doi.org/10.1073/pnas.0704031104 find 0.7 to 1.7 spines per μm in pyramidal neurons in vivo in L2/3 of the developing mouse cortex, at P10 to P20.

      • Glynn et al., 2011 https://doi.org/10.1038/nn.2764 find 0.1 to 0.7 spines per μm^2 in pyramidal neurons in vivo and in vitro in L2/3 of the developing mouse cortex, at P8 to P60.

      In the developing hippocampus:

      Although these values vary somewhat across experiments, in most cases they are in agreement with our chosen values, especially when taking into account that we are modeling development (rather than adulthood).

      3) Soma/neuron density: Indeed, we did not exactly mention this number anywhere in the paper. But from the figures we can infer 9 somas growing dendrites on an area of ~34,000 μm^2. Thus, neuron density would be 300 neurons per mm^2. This number seems a bit low after a short search through the literature. For e.g. Keller et al., 2018 https://www.frontiersin.org/articles/10.3389/fnana.2018.00083/full reports about 90,000 neurons per mm^3, albeit in adulthood.

      We are also performing a sensitivity analysis where some of these parameters are varied and will include this in the revised manuscript. In particular:

      (1) We will vary the nature of the input correlations. In the current model, the synapses in each correlated group receive spike trains with a perfect correlation and there are no correlations across the groups. We will reduce the correlations within group and add non-zero correlations across the groups.

      (2) We will vary the density of the neuronal somas. We expect that higher densities of somas will either yield smaller dendritic areas because the different neurons compete more or result in a state where nearby neurons have to complement each other regarding their activity preferences.

      (3) We will introduce dynamics in the potential synapses to model the dynamics of axons. We plan to explore several scenarios. We could introduce a gradual increase in the density of potential synapses and implement a cap on the number of synapses that can be alive at the same time, and vary that cap. We could also introduce a lifetime of each synapse (following for example a lognormal distribution). A potential synapse can disappear if it does not form a stable synapse in its lifetime, in which case it could move to a different location.

      Point 1.2. Many potentially important phenomena seem to be excluded. I realize that no model can be complete, but the choice of which phenomena to include or exclude from this model could bias studies that make use of it and is worth serious discussion. The development of axons is concurrent with dendrite outgrowth, is highly dynamic, and perhaps better understood mechanistically. In this model, the inputs are essentially static. Growing dendrites acquire and lose growth cones that are associated with rapid extension, but these do not seem to be modeled. Postsynaptic firing does not appear to be modeled, which may be critical to activity-dependent plasticity. For example, changes in firing are a potential explanation for the global changes in dendritic pruning that occur following the outgrowth phase.

      As the reviewer concludes, no model can be complete. In agreement with this, here we would like to quote a paragraph from a very nice paper by Larry Abbott (“Theoretical Neuroscience Rising, Neuron 2008 https://www.sciencedirect.com/science/article/pii/S0896627308008921) which although published more than 10 years ago, still applies today:

      “Identifying the minimum set of features needed to account for a particular phenomenon and describing these accurately enough to do the job is a key component of model building. Anything more than this minimum set makes the model harder to understand and more difficult to evaluate. The term ‘‘realistic’’ model is a sociological rather than a scientific term. The truly realistic model is as impossible and useless a concept as Borges’ ‘‘map of the empire that was of the same scale as the empire and that coincided with it point for point’’ (Borges, 1975). […] The art of modeling lies in deciding what this subset should be and how it should be described.”

      We have clearly stated in the Introduction (e.g. lines 37-75) which phenomena we include in the model and why. The Discussion also compares our model to others (lines 315-373), pointing out that most models either focus on activity-independent or activity-dependent phases. We include both, combining literature on molecular gradients and growth factors, with activity-dependent connectivity refinements instructed by spontaneous activity. We could not think of a more tractable, more minimalist model that would include both activity-independent or activity-dependent aspects. Therefore, we feel that the current manuscript provides sufficient motivation but also a discussion of limitations of the current model.

      Regarding including the concurrent development of axons, we agree this is very interesting and currently not addressed in the model. As noted at the bottom of our reply to point 1.1, bullet (3) we are now revising the manuscript to include a simplified form of axonal dynamics by allowing changes in the lifetime and location of potential synapses, which come from axons of presynaptic partners.

      Regarding postsynaptic firing, this is indeed super relevant and an important point to consider. In one of our recent publications (Kirchner and Gjorgjieva, 2021 https://www.nature.com/articles/s41467-021-23557-3), we studied only an activity-dependent model for the organization of synaptic inputs on non-growing dendrites which have a fixed length. There, we considered the effect of postsynaptic firing and demonstrated that it plays an important role in establishing a global organization of synapses on the entire dendritic tree of the neuron, and not just local dendritic branches. For example, we showed that could that it could lead to the emergence of retinotopic maps which have been found experimentally (Iacaruso et al., 2017 https://www.nature.com/articles/nature23019). Since we use the same activity-dependent plasticity model in this paper, we expect that the somatic firing will have the same effect on establishing synaptic distributions on the entire dendritic tree. We will make a note of this in the Discussion in the revised paper.

      Point 1.3. Line 167. There are many ways to include activity -independent and -dependent components into a model and not every such model shows stability. A key feature seems to be that larger arbors result in reduced growth and/or increased retraction, but this could be achieved in many ways (whether activity dependent or not). It's not clear that this result is due to the combination of activity-dependent and independent components in the model, or conceptually why that should be the case.

      We never argued for model uniqueness. There are always going to be many different models (at different spatial and temporal scales, at different levels of abstraction). We can never study all of them and like any modeling study in systems neuroscience we have chosen one model approach and investigated this approach. We do compare the current model to others in the Discussion. If the reviewers have a specific implementation that we should compare our model to as an alternative, we could try, but not if this means doing a completely separate project.

      Point 1.4. Line 183. The explanation of overshoot in terms of the different timescales of synaptic additions versus activity-dependent retractions was not something I had previously encountered and is an interesting proposal. Have these timescales been measured experimentally? To what extent is this a result of fine-tuning of simulation parameters?

      We found that varying the amount of BDNF controls the timescale of the activity-dependent plasticity (see our Figure 5c). Hence, changing the balance between synaptic additions vs. retractions is already explored in Figure 5e and f. Here we show that the overshoot and retraction does not have to be fine-tuned but may be abolished if there is too much activity-dependent plasticity.

      Regarding the relative timescales of synaptic additions vs. retractions: since the first is mainly due to activity-independent factors, and the second due to activity-dependent plasticity, the questions is really about the timescales of the latter two. As we write in the Introduction (lines 60-62), manipulating activity-dependent synaptic transmission has been found to not affect morphology but rather the density and specificity of synaptic connections (Ultanir et al. 2007 https://doi.org/10.1073/pnas.0704031104), supporting the sequential model we have (although we do not impose the sequence, as both activity-independent and activity-dependent mechanisms are always “on”; but note that activity-dependent plasticity can only operate on synapses that have already formed).

      Point 1.5. Line 203. This result seems at odds with results that show only a very weak bias in the tuning distribution of inputs to strongly tuned cortical neurons (e.g. work by Arthur Konnerth's group). This discrepancy should be discussed.

      First, we note that the correlated activity experienced by our modeled synapses (and resulting synaptic organization) does not necessarily correspond to visual orientation, or any stimulus feature, for that matter.

      Nonetheless, this is a very interesting question and there is some variability in what the experimental data show. Many studies have shown that synapses on dendrites are organized into functional synaptic clusters: across brain regions, developmental ages and diverse species from rodent to primate (Kleindienst et al. 2011; Takahashi et al. 2012; Winnubst et al. 2015; Gökçe et al., 2016; Wilson et al. 2016; Iacaruso et al., 2017; Scholl et al., 2017; Niculescu et al. 2018; Kerlin et al. 2019; Ju et al. 2020). Interestingly, some in vivo studies have reported lack of fine-scale synaptic organization (Varga et al. 2011; X. Chen et al. 2011; T.-W. Chen et al. 2013; Jia et al. 2010; Jia et al. 2014), while others reported clustering for different stimulus features in different species. For example, dendritic branches in the ferret visual cortex exhibit local clustering of orientation selectivity but do not exhibit global organization of inputs according to spatial location and receptive field properties (Wilson et al. 2016; Scholl et al., 2017). In contrast, synaptic inputs in mouse visual cortex do not cluster locally by orientation, but only by receptive field overlap, and exhibit a global retinotopic organization along the proximal-distal axis (Iacaruso et al., 2017). We proposed a theoretical framework to reconcile these data: combining activity-dependent plasticity similar to the BDNF-proBDNF model that we used in the current work, and a receptive field model for the different species (Kirchner and Gjorgjieva, 2021 https://www.nature.com/articles/s41467-021-23557-3). We can mention this aspect in the revised manuscript.

      Point 1.6. Line 268. How does the large variability in the size of the simulated arbors relate to the relatively consistent size of arbors of cortical cells of a given cell type? This variability suggests to me that these simulations could be sensitive to small changes in parameters (e.g. to the density or layout of presynapses).

      As noted at the bottom of our reply to point 1.1, bullet (3) we are now revising the manuscript to include changes in the lifetime and location of potential synapses.

      Point 1.7. The modeling of dendrites as two-dimensional will likely limit the usefulness of this model. Many phenomena- such as diffusion, random walks, topological properties, etc - fundamentally differ between two and three dimensions.

      The reviewer is right about there being differences between two and three dimensions. But a simpler model does not mean a useless model even if not completely realistic. We have ongoing work that extends the current model to 3D but is beyond the scope of the current paper. In systems neuroscience, people have found very interesting results making such simplified geometric assumptions about networks, for instance the one-dimensional ring model has been used to uncover fundamental insights about computations even though highly simplified and abstracted.

      Point 1.8. The description of wiring lengths as 'approximately optimal' in this text is problematic. The plotted data show that the wiring lengths are several deviations away from optimal, and the random model is not a valid instantiation of the 2D non-overlapping constraints the authors imposed. A more appropriate null should be considered.

      We did not use the term “optimal” in line with previous literature. We wrongly referred to the minimal wiring length as the optimal wiring length, but neurons can optimize their wiring not only by minimizing their dendritic length (e.g. work of Hermann Cuntz). In the revised manuscript, we will replace the term “optimal wiring” with “minimal wiring”. Then we will compare the wiring length in the model with the theoretically minimal wiring length, the random wiring length and the actual data.

      Point 1.9. It's not clear to me what the authors are trying to convey by repeatedly labeling this model as 'mechanistic'. The mechanisms implemented in the model are inspired by biological phenomena, but the implementations have little resemblance to the underlying biophysical mechanisms. Overall my impression is that this is a phenomenological model intended to show under what conditions particular patterns are possible. Line 363, describing another model as computational but not mechanistic, was especially unclear to me in this context.

      What we mean by mechanistic is that we implement equations that model specific mechanisms i.e. we have a set of equations that implement the activity-independent attraction to potential synapses (with parameters such as the density of synapses, their spatial influence, etc) and the activity-dependent refinement of synapses (with parameters such as the ratio of BDNF and proBDNF to induce potentiation vs depression, the activity-dependent conversion of one factor to the other, etc). This is a bottom-up approach where we combine multiple elements together to get to neuronal growth and synaptic organization. This approach is in stark contrast to the so-called top-down or normative approaches where the method would involve defining an objective function (e.g. minimal dendritic length) which depends on a set of parameters and then applying a gradient descent or other mathematical optimization technique to get at the parameters that optimize the objective function. This latter approach we would not call mechanistic because it involves an abstract objective function (who could say what a neuron or a circuit should be trying to optimize) and a mathematical technique for how to optimize the function (we don’t know of neurons can compute gradients of abstract objective functions).

      Hence our model is mechanistic, but it does operate at a particular level of abstraction/simplification. We don’t model individual ion channels, or biophysics of synaptic plasticity (opening and closing of NMDA channels, accumulation of proteins at synapses, protein synthesis). We do, however, provide a biophysical implementation of the plasticity mechanism though the BDNF/proBDNF model which is more than most models of plasticity achieve, because they typically model a phenomenological STDP or Hebbian rule that just uses activity patterns to potential or depress synaptic weights, disregarding how it could be implemented.

      Reviewer #2 (Public Review):

      This work combines a model of two-dimensional dendritic growth with attraction and stabilisation by synaptic activity. The authors find that constraining growth models with competition for synaptic inputs produces artificial dendrites that match some key features of real neurons both over development and in terms of final structure. In particular, incorporating distance-dependent competition between synapses of the same dendrite naturally produces distinct phases of dendritic growth (overshoot, pruning, and stabilisation) that are observed biologically and leads to local synaptic organisation with functional relevance. The approach is elegant and well-explained, but makes some significant modelling assumptions that might impact the biological relevance of the results.

      Strengths:

      The main strength of the work is the general concept of combining morphological models of growth with synaptic plasticity and stabilisation. This is an interesting way to bridge two distinct areas of neuroscience in a manner that leads to findings that could be significant for both. The modelling of both dendritic growth and distance-dependent synaptic competition is carefully done, constrained by reasonable biological mechanisms, and well-described in the text. The paper also links its findings, for example in terms of phases of dendritic growth or final morphological structure, to known data well.

      Weaknesses:

      The major weaknesses of the paper are the simplifying modelling assumptions that are likely to have an impact on the results. These assumptions are not discussed in enough detail in the current version of the paper.

      1) Axonal dynamics.

      A major, and lightly acknowledged, assumption of this paper is that potential synapses, which must come from axons, are fixed in space. This is not realistic for many neural systems, as multiple undifferentiated neurites typically grow from the soma before an axon is specified (Polleux & Snider, 2010). Further, axons are also dynamic structures in early development and, at least in some systems, undergo activity-dependent morphological changes too (O'Leary, 1987; Hall 2000). This paper does not consider the implications of joint pre- and post-synaptic growth and stabilisation.

      We thank the reviewer for the summary of the strengths and weaknesses of the work. While we feel that including a full model of axonal dynamics is beyond the scope of the current manuscript, some aspects of axonal dynamics can be included. In a revised model, we will introduce a gradual increase in the density of potential synapses and implement a cap on the number of synapses that can be alive at the same time, and vary that cap. We plan to also introduce a lifetime of each synapse (following for example a lognormal distribution). A potential synapse can disappear if it does not form a stable synapse in its lifetime, in which case it could move to a different location. See also our reply to reviewer comment 1.1, bullet (3).

      2) Activity correlations

      On a related note, the synapses in the manuscript display correlated activity, but there is no relationship between the distance between synapses and their correlation. In reality, nearby synapses are far more likely to share the same axon and so display correlated activity. If the input activity is spatially correlated and synaptic plasticity displays distance-dependent competition in the dendrites, there is likely to be a non-trivial interaction between these two features with a major impact on the organisation of synaptic contacts onto each neuron.

      We are exploring the amount of correlation (between and within correlated groups) to include in the revised manuscript (see also our reply to reviewer comment 1.1, bullet (1)).

      However, previous experimental work, (Kleindienst et al., 2011 https://doi.org/10.1016/j.neuron.2011.10.015) has provided anatomical and functional analyses that it is unlikely that the functional synaptic clustering on dendritic branches is the result of individual axons making more than one synapse (see pg. 1019).

      3) BDNF dynamics

      The models are quite sensitive to the ratio of BDNF to proBDNF (eg Figure 5c). This ratio is also activity-dependent as synaptic activation converts proBDNF into BDNF. The models assume a fixed ratio that is not affected by synaptic activity. There should at least be more justification for this assumption, as there is likely to be a positive feedback relationship between levels of BDNF and synaptic activation.

      The reviewer is correct. We used the BDNF-proBDNF model for synaptic plasticity based on our previous work: Kirchner and Gjorgjieva, 2021 https://www.nature.com/articles/s41467-021-23557-3.

      There, we explored only the emergence of functionally clustered synapses on static dendrites which do not grow. In the Methods section (Parameters and data fitting) we justify the choice of the ratio of BDNF to proBDNF from published experimental work. We also performed sensitivity analysis (Supplementary Fig. 1) and perturbation simulations (Supplementary Fig. 3), which showed that the ratio is crucial in regulating the overall amount of potentiation and depression of synaptic efficacy, and therefore has a strong impact on the emergence and maintenance of synaptic organization. Since we already performed all this analysis, we do not expect there will be any differences in the current model which includes dendritic growth, as the activity-dependent mechanism has such a different timescale.

      A further weakness is in the discussion of how the final morphologies conform to principles of optimal wiring, which is quite imprecise. 'Optimal wiring' in the sense of dendrites and axons (Cajal, 1895; Chklovskii, 2004; Cuntz et al, 2007, Budd et al, 2010) is not usually synonymous with 'shortest wiring' as implied here. Instead, there is assumed to be a balance between minimising total dendritic length and minimising the tree distance (ie Figure 4c here) between synapses and the site of input integration, typically the soma. The level of this balance gives the deviation from the theoretical minimum length as direct paths to synapses typically require longer dendrites. In the model this is generated by the guidance of dendritic growth directly towards the synaptic targets. The interpretation of the deviation in this results section discussing optimal wiring, with hampered diffusion of signalling molecules, does not seem to be correct.

      We agree with this comment. We had wrongly used the term “optimal wiring” as neurons can optimize their wiring not only by minimizing their dendritic length but other factors as noted by the reviewer. In the revised manuscript will replace the term “optimal wiring” with “minimal wiring” and discuss these differences to previous work.

      Reviewer #3 (Public Review):

      The authors propose a mechanistic model of how the interplay between activity-independent growth and an activity-dependent synaptic strengthening/weaken model influences the dendrite shape, complexity and distribution of synapses. The authors focus on a model for stellate cells, which have multiple dendrites emerging from a soma. The activity independent component is provided by a random pool of presynaptic sites that represent potential synapses and that release a diffusible signal that promotes dendritic growth. Then a spontaneous activity pattern with some correlation structure is imposed at those presynaptic sites. The strength of these synapses follow a learning rule previously proposed by the lab: synapses strengthen when there is correlated firing across multiple sites, and synapses weaken if there is uncorrelated firing with the relative strength of these processes controlled by available levels of BDNF/proBDNF. Once a synapse is weakened below a threshold, the dendrite branch at that site retracts and loses its sensitivity to the growth signal

      The authors run the simulation and map out how dendrites and synapses evolve and stabilize. They show that dendritic trees growing rapidly and then stabilize by balancing growth and retraction (Figure 2). They also that there is an initial bout of synaptogenesis followed by loss of synapses, reflecting the longer amount of time it takes to weaken a synapse (Figure 3). They analyze how this evolution of dendrites and synapses depends on the correlated firing of synapses (i.e. defined as being in the same "activity group"). They show that in the stabilized phase, synapses that remain connected to a given dendritic branch are likely to be from same activity group (Figure 4). The authors systemically alter the learning rule by changing the available concentration of BDNF, which alters the relative amount of synaptic strengthening, which in turn affects stabilization, density of synapses and interestingly how selective for an activity group one dendrite is (Figure 5). In addition the authors look at how altering the activity-independent factors influences outgrowth (Figure 6). Finally, one of the interesting outcomes is that the resulting dendritic trees represent "optimal wiring" solutions in the sense that dendrites use the shortest distance given the distribution of synapses. They compare this distribute to one published data to see how the model compared to what has been observed experimentally.

      There are many strengths to this study. The consequence of adding the activity-dependent contribution to models of synapto- and dendritogenesis is novel. There is some exploration of parameters space with the motivation of keeping the parameters as well as the generated outcomes close to anatomical data of real dendrites. The paper is also scholarly in its comparison of this approach to previous generative models. This work represented an important advance to our understanding of how learning rules can contribute to dendrite morphogenesis

      We thank the reviewer for the positive evaluation of the work and the suggestions below.

    1. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Authors has provided a mechanism by which how presence of truncated P53 can inactivate function of full length P53 protein. Authors proposed this happens by sequestration of full length P53 by truncated P53.

      In the study, performed experiments are well described.

      My area of expertise is molecular biology/gene expression, and I have tried to provide suggestions on my area of expertise. The study has been done mainly with overexpression system and I have included few comments which I can think can be helpful to understand effect of truncated P53 on endogenous wild type full length protein. Performing experiments on these lines will add value to the observation according to this reviewer.

      Major comments:

      (1) What happens to endogenous wild type full length P53 in the context of mutant/truncated isoforms, that is not clear. Using a P53 antibody which can detect endogenous wild type P53, can authors check if endogenous full length P53 protein is also aggregated as well? It is hard to differentiate if aggregation of full length P53 happens only in overexpression scenario, where lot more both of such proteins are expressed. In normal physiological condition P53 expression is usually low, tightly controlled and its expression get induced in altered cellular condition such as during DNA damage. So, it is important to understand the physiological relevance of such aggregation, which could be possible if authors could investigate effect on endogenous full length P53 following overexpression of mutant isoforms.

      Thank you very much for your insightful comments.

      (1) To address “what happens to endogenous wild-type full-length P53 in the context of mutant/truncated isoforms," we employed a human A549 cell line expressing endogenous wild-type p53 under DNA damage conditions such as an etoposide treatment(1). We choose the A549 cell line since similar to H1299, it is a lung cancer cell line (www.atcc.org). For comparison, we also transfected the cells with 2 μg of V5-tagged plasmids encoding FLp53 and its isoforms Δ133p53 and Δ160p53. As shown in Author response image 1A, lanes 1 and 2, endogenous p53 expression, remained undetectable in A549 cells despite etoposide treatment, which limits our ability to assess the effects of the isoforms on the endogenous wild-type FLp53. We could, however, detect the V5-tagged FLp53 expressed from the plasmid using anti-V5 (rabbit) as well as with antiDO-1 (mouse) antibody (Author response image 1). The latter detects both endogenous wildtype p53 and the V5-tagged FLp53 since the antibody epitope is within the Nterminus (aa 20-25). This result supports the reviewer’s comment regarding the low level of expression of endogenous p53 that is insufficient for detection in our experiments.   

      In summary, in line with the reviewer’s comment that ‘under normal physiological conditions p53 expression is usually low,’ we could not detect p53 with an anti-DO-1 antibody. Thus, we proceeded with V5/FLAG-tagged p53 for detection of the effects of the isoforms on p53 stability and function. We also found that protein expression in H1299 cells was more easily detectable than in A549 cells (Compare Author response image 1A and B). Thus, we decided to continue with the H1299 cells (p53-null), which would serve as a more suitable model system for this study.  

      (2) We agree with the reviewer that ‘It is hard to differentiate if aggregation of full-length p53 happens only in overexpression scenario’. However, it is not impossible to imagine that such aggregation of FLp53 happens under conditions when p53 and its isoforms are over-expressed in the cell. Although the exact physiological context is not known and beyond the scope of the current work, our results indicate that at higher expression, p53 isoforms drive aggregation of FLp53. Given the challenges of detecting endogenous FLp53, we had to rely on the results obtained with plasmid mediated expression of p53 and its isoforms in p53-null cells.

      Author response image 1.

      Comparative analysis of protein expression in A549 and H1299 cells. (A) A549 cells (p53 wild-type) were treated with etoposide to induce endogenous wild-type p53 expression. To assess the effects of FLp53 and its isoforms Δ133p53 and Δ160p53 on endogenous wild-type p53 aggregation, A549 cells were transfected with 2 μg of V5-tagged p53 expression plasmids, with or without etoposide (20μM for 8h) treatment. Western blot analysis was done with the anti-V5 (rabbit) to detect V5-tagged proteins and anti-DO-1 (mouse), the latter detects both endogenous wild-type p53 and V5-tagged FLp53. The merged image corresponds to the overlay between the V5 and DO1 antibody signals. (B) H1299 cells (p53-null) were transfected with 2 μg V5tagged p53 expression plasmids or the empty vector control pcDNA3.1. Western blot analysis was done with the anti-V5 (mouse) antibody. 

      (2) Can presence of mutant P53 isoforms can cause functional impairment of wild type full length endogenous P53? That could be tested as well using similar ChIP assay authors has performed, but instead of antibody against the Tagged protein if the authors could check endogenous P53 enrichment in the gene promoter such as P21 following overexpression of mutant isoforms. May be introducing a condition such as DNA damage in such experiment might help where endogenous P53 is induced and more prone to bind to P53 target such as P21.

      Thank you very much for your valuable comments and suggestions. To investigate the potential functional impairment of endogenous wild-type p53 by p53 isoforms, we initially utilized A549 cells (p53 wild-type), aiming to monitor endogenous wild-type p53 expression following DNA damage. However, as mentioned and demonstrated in Author response image 1, endogenous p53 expression was too low to be detected under these conditions, making the ChIP assay for analyzing endogenous p53 activity unfeasible. Thus, we decided to utilize plasmid-based expression of FLp53 and focus on the potential functional impairment induced by the isoforms.

      (3) On similar lines, authors described:

      "To test this hypothesis, we escalated the ratio of FLp53 to isoforms to 1:10. As expected, the activity of all four promoters decreased significantly at this ratio (Figure 4A-D). Notably, Δ160p53 showed a more potent inhibitory effect than Δ133p53 at the 1:5 ratio on all promoters except for the p21 promoter, where their impacts were similar (Figure 4E-H). However, at the 1:10 ratio, Δ133p53 and Δ160p53 had similar effects on all transactivation except for the MDM2 promoter (Figure 4E-H)."

      Again, in such assay authors used ratio 1:5 to 1:10 full length vs mutant. How authors justify this result in context (which is more relevant context) where one allele is Wild type (functional P53) and another allele is mutated (truncated, can induce aggregation). In this case one would except 1:1 ratio of full-length vs mutant protein, unless other regulation is going which induces expression of mutant isoforms more than wild type full length protein. Probably discussing on these lines might provide more physiological relevance to the observed data.

      Thank you for raising this point regarding the physiological relevance of the ratios used in our study.

      (1) In the revised manuscript (lines 193-195), we added in this direction that “The elevated Δ133p53 protein modulates p53 target genes such as miR‑34a and p21, facilitating cancer development(2, 3). To mimic conditions where isoforms are upregulated relative to FLp53, we increased the ratios to 1:5 and 1:10.” This approach aims to simulate scenarios where isoforms accumulate at higher levels than FLp53, which may be relevant in specific contexts, as also elaborated above.

      (2) Regarding the issue of protein expression, where one allele is wild-type and the other is isoform, this assumption is not valid in most contexts. First, human cells have two copies of TPp53 gene (one from each parent). Second, the TP53 gene has two distinct promoters: the proximal promoter (P1) primarily regulates FLp53 and ∆40p53, whereas the second promoter (P2) regulates ∆133p53 and ∆160p53(4, 5). Additionally, ∆133TP53 is a p53 target gene(6, 7) and the expression of Δ133p53 and FLp53 is dynamic in response to various stimuli. Third, the expression of p53 isoforms is regulated at multiple levels, including transcriptional, post-transcriptional, translational, and post-translational processing(8). Moreover, different degradation mechanisms modify the protein level of p53 isoforms and FLp53(8). These differential regulation mechanisms are regulated by various stimuli, and therefore, the 1:1 ratio of FLp53 to ∆133p53 or ∆160p53 may be valid only under certain physiological conditions. In line with this, varied expression levels of FLp53 and its isoforms, including ∆133p53 and ∆160p53, have been reported in several studies(3, 4, 9, 10). 

      (3) In our study, using the pcDNA 3.1 vector under the human cytomegalovirus (CMV) promoter, we observed moderately higher expression levels of ∆133p53 and ∆160p53 relative to FLp53 (Author response image 1B). This overexpression scenario provides a model for studying conditions where isoform accumulation might surpass physiological levels, impacting FLp53 function. By employing elevated ratios of these isoforms to FLp53, we aim to investigate the potential effects of isoform accumulation on FLp53.

      (4) Finally does this altered function of full length P53 (preferably endogenous one) in presence of truncated P53 has any phenotypic consequence on the cells (if authors choose a cell type which is having wild type functional P53). Doing assay such as apoptosis/cell cycle could help us to get this visualization.

      Thank you for your insightful comments. In the experiment with A549 cells (p53 wild-type), endogenous p53 levels were too low to be detected, even after DNA damage induction. The evaluation of the function of endogenous p53 in the presence of isoforms is hindered, as mentioned above. In the revised manuscript, we utilized H1299 cells with overexpressed proteins for apoptosis studies using the Caspase-Glo® 3/7 assay (Figure 7). This has been shown in the Results section (lines 254-269). “The Δ133p53 and Δ160p53 proteins block pro-apoptotic function of FLp53.

      One of the physiological read-outs of FLp53 is its ability to induce apoptotic cell death(11). To investigate the effects of p53 isoforms Δ133p53 and Δ160p53 on FLp53-induced apoptosis, we measured caspase-3 and -7 activities in H1299 cells expressing different p53 isoforms (Figure 7). Caspase activation is a key biochemical event in apoptosis, with the activation of effector caspases (caspase-3 and -7) ultimately leading to apoptosis(12). The caspase-3 and -7 activities induced by FLp53 expression was approximately 2.5 times higher than that of the control vector (Figure 7). Co-expression of FLp53 and the isoforms Δ133p53 or Δ160p53 at a ratio of 1: 5 significantly diminished the apoptotic activity of FLp53 (Figure 7). This result aligns well with our reporter gene assay, which demonstrated that elevated expression of Δ133p53 and Δ160p53 impaired the expression of apoptosis-inducing genes BAX and PUMA (Figure 4G and H). Moreover, a reduction in the apoptotic activity of FLp53 was observed irrespective of whether Δ133p53 or Δ160p53 protein was expressed with or without a FLAG tag (Figure 7). This result, therefore, also suggests that the FLAG tag does not affect the apoptotic activity or other physiological functions of FLp53 and its isoforms. Overall, the overexpression of p53 isoforms Δ133p53 and Δ160p53 significantly attenuates FLp53-induced apoptosis, independent of the protein tagging with the FLAG antibody epitope.”

      Referees cross-commenting

      I think the comments from the other reviewers are very much reasonable and logical.

      Especially all 3 reviewers have indicated, a better way to visualize the aggregation of full-length wild type P53 by truncated P53 (such as looking at endogenous P53# by reviewer 1, having fluorescent tag #by reviewer 2 and reviewer 3 raised concern on the FLAG tag) would add more value to the observation.

      Thank you for these comments. The endogenous p53 protein was undetectable in A549 cells induced by etoposide (Figure R1A). Therefore, we conducted experiments using FLAG/V5-tagged FLp53.  To avoid any potential side effects of the FLAG tag on p53 aggregation, we introduced untagged p53 isoforms in the H1299 cells and performed subcellular fractionation. Our revised results, consistent with previous FLAG-tagged p53 isoforms findings, demonstrate that co-expression of untagged isoforms with FLAG-tagged FLp53 significantly induced the aggregation of FLAG-FLp53, while no aggregation was observed when FLAG-tagged FLp53 was expressed alone (Supplementary Figure 6). These results clearly indicate that the FLAG tag itself does not contribute to protein aggregation. 

      Additionally, we utilized the A11 antibody to detect protein aggregation, providing additional validation (Figure 8 from Jean-Christophe Bourdon et al. Genes Dev. 2005;19:2122-2137). Given that the fluorescent proteins (~30 kDa) are substantially bigger than the tags used here (~1 kDa) and may influence oligomerization (especially GFP), stability, localization, and function of p53 and its isoforms, we avoided conducting these vital experiments with such artificial large fusions. 

      Reviewer #1 (Significance):

      The work in significant, since it points out more mechanistic insight how wild type full length P53 could be inactivated in the presence of truncated isoforms, this might offer new opportunity to recover P53 function as treatment strategies against cancer.

      Thank you for your insightful comments. We appreciate your recognition of the significance of our work in providing mechanistic insights into how wild-type FLp53 can be inactivated by truncated isoforms. We agree that these findings have potential for exploring new strategies to restore p53 function as a therapeutic approach against cancer. 

      Reviewer #2 (Evidence, reproducibility and clarity):

      The manuscript by Zhao and colleagues presents a novel and compelling study on the p53 isoforms, Δ133p53 and Δ160p53, which are associated with aggressive cancer types. The main objective of the study was to understand how these isoforms exert a dominant negative effect on full-length p53 (FLp53). The authors discovered that the Δ133p53 and Δ160p53 proteins exhibit impaired binding to p53-regulated promoters. The data suggest that the predominant mechanism driving the dominant-negative effect is the coaggregation of FLp53 with Δ133p53 and Δ160p53.

      This study is innovative, well-executed, and supported by thorough data analysis. However, the authors should address the following points:

      (1) Introduction on Aggregation and Co-aggregation: Given that the focus of the study is on the aggregation and co-aggregation of the isoforms, the introduction should include a dedicated paragraph discussing this issue. There are several original research articles and reviews that could be cited to provide context.

      Thank you very much for the valuable comments. We have added the following paragraph in the revised manuscript (lines 74-82): “Protein aggregation has become a central focus of modern biology research and has documented implications in various diseases, including cancer(13, 14, 15). Protein aggregates can be of different types ranging from amorphous aggregates to highly structured amyloid or fibrillar aggregates, each with different physiological implications. In the case of p53, whether protein aggregation, and in particular, co-aggregation with large N-terminal deletion isoforms, plays a mechanistic role in its inactivation is yet underexplored. Interestingly, the Δ133p53β isoform has been shown to aggregate in several human cancer cell lines(16). Additionally, the Δ40p53α isoform exhibits a high aggregation tendency in endometrial cancer cells(17). Although no direct evidence exists for Δ160p53 yet, these findings imply that p53 isoform aggregation may play a major role in their mechanisms of actions.”

      (2) Antibody Use for Aggregation: To strengthen the evidence for aggregation, the authors should consider using antibodies that specifically bind to aggregates.

      Thank you for your insightful suggestion. We addressed protein aggregation using the A11 antibody which specifically recognizes amyloid-like protein aggregates. We analyzed insoluble nuclear pellet samples prepared under identical conditions as described in Figure 6B. To confirm the presence of p53 proteins, we employed the anti-p53 M19 antibody (Santa Cruz, Cat No. sc-1312) to detect bands corresponding to FLp53 and its isoforms Δ133p53 and Δ160p53. The monomer FLp53 was not detected (Figure 8, lower panel, Jean-Christophe Bourdon et al. Genes Dev. 2005;19:2122-2137), which may be attributed to the lower binding affinity of the anti-p53 M19 antibody to it. These samples were also immunoprecipitated using the A11 antibody (Thermo Fischer Scientific, Cat No. AHB0052) to detect aggregated proteins. Interestingly, FLp53 and its isoforms, Δ133p53 and Δ160p53, were clearly visible with Anti-A11 antibody when co-expressed at a 1:5 ratio suggesting that they underwent co-aggregation. However, no FLp53 aggregates were observed when it was expressed alone (Author response image 2). These results support the conclusion in our manuscript that Δ133p53 and Δ160p53 drive FLp53 aggregation. 

      Author response image 2.

      Induction of FLp53 Aggregation by p53 Isoforms Δ133p53 and Δ160p53. H1299 cells transfected with the FLAG-tagged FLp53 and V5-tagged Δ133p53 or Δ160p53 at a 1:5 ratio. The cells were subjected to subcellular fractionation, and the resulting insoluble nuclear pellet was resuspended in RIPA buffer. The samples were heated at 95°C until the pellet was completely dissolved, and then analyzed by Western blotting. Immunoprecipitation was performed using the A11 antibody, which specifically recognizes amyloid protein aggregates, and the anti-p53 M19 antibody, which detects FLp53 as well as its isoforms Δ133p53 and Δ160p53. 

      (3) Fluorescence Microscopy: Live-cell fluorescence microscopy could be employed to enhance visualization by labeling FLp53 and the isoforms with different fluorescent markers (e.g., EGFP and mCherry tags).

      We appreciate the suggestion to use live-cell fluorescence microscopy with EGFP and mCherry tags for the visualization FLp53 and its isoforms. While we understand the advantages of live-cell imaging with EGFP / mCherry tags, we restrained us from doing such fusions as the GFP or corresponding protein tags are very big (~30 kDa) with respect to the p53 isoform variants (~30 kDa).  Other studies have shown that EGFP and mCherry fusions can alter protein oligomerization, solubility and aggregation(18, 19) Moreover, most fluorescence proteins are prone to dimerization (i.e. EGFP) or form obligate tetramers (DsRed)(20, 21, 22), potentially interfering with the oligomerization and aggregation properties of p53 isoforms, particularly Δ133p53 and Δ160p53.

      Instead, we utilized FLAG- or V5-tag-based immunofluorescence microscopy, a well-established and widely accepted method for visualizing p53 proteins. This method provided precise localization and reliable quantitative data, which we believe meet the needs of the current study. We believe our chosen method is both appropriate and sufficient for addressing the research question.

      Reviewer #2 (Significance):

      The manuscript by Zhao and colleagues presents a novel and compelling study on the p53 isoforms, Δ133p53 and Δ160p53, which are associated with aggressive cancer types. The main objective of the study was to understand how these isoforms exert a dominant negative effect on full-length p53 (FLp53). The authors discovered that the Δ133p53 and Δ160p53 proteins exhibit impaired binding to p53-regulated promoters. The data suggest that the predominant mechanism driving the dominant-negative effect is the coaggregation of FLp53 with Δ133p53 and Δ160p53.

      We sincerely thank the reviewer for the thoughtful and positive comments on our manuscript and for highlighting the significance of our findings on the p53 isoforms, Δ133p53 and Δ160p53. 

      Reviewer #3 (Evidence, reproducibility and clarity):

      In this manuscript entitled "Δ133p53 and Δ160p53 isoforms of the tumor suppressor protein p53 exert dominant-negative effect primarily by coaggregation", the authors suggest that the Δ133p53 and Δ160p53 isoforms have high aggregation propensity and that by co-aggregating with canonical p53 (FLp53), they sequestrate it away from DNA thus exerting a dominantnegative effect over it.

      First, the authors should make it clear throughout the manuscript, including the title, that they are investigating Δ133p53α and Δ160p53α since there are 3 Δ133p53 isoforms (α, β, γ), and 3 Δ160p53 isoforms (α, β, γ).

      Thank you for your suggestion. We understand the importance of clearly specifying the isoforms under study. Following your suggestion, we have added α in the title, abstract, and introduction and added the following statement in the Introduction (lines 57-59): “For convenience and simplicity, we have written Δ133p53 and Δ160p53 to represent the α isoforms (Δ133p53α and Δ160p53α) throughout this manuscript.” 

      One concern is that the authors only consider and explore Δ133p53α and Δ160p53α isoforms as exclusively oncogenic and FLp53 dominant-negative while not discussing evidences of different activities. Indeed, other manuscripts have also shown that Δ133p53α is non-oncogenic and non-mutagenic, do not antagonize every single FLp53 functions and are sometimes associated with good prognosis. To cite a few examples:

      (1) Hofstetter G. et al. D133p53 is an independent prognostic marker in p53 mutant advanced serous ovarian cancer. Br. J. Cancer 2011, 105, 15931599.

      (2) Bischof, K. et al. Influence of p53 Isoform Expression on Survival in HighGrade Serous Ovarian Cancers. Sci. Rep. 2019, 9,5244.

      (3) Knezovi´c F. et al. The role of p53 isoforms' expression and p53 mutation status in renal cell cancer prognosis. Urol. Oncol. 2019, 37, 578.e1578.e10.

      (4) Gong, L. et al. p53 isoform D113p53/D133p53 promotes DNA doublestrand break repair to protect cell from death and senescence in response to DNA damage. Cell Res. 2015, 25, 351-369.

      (5) Gong, L. et al. p53 isoform D133p53 promotes efficiency of induced pluripotent stem cells and ensures genomic integrity during reprogramming. Sci. Rep. 2016, 6, 37281.

      (6) Horikawa, I. et al. D133p53 represses p53-inducible senescence genes and enhances the generation of human induced pluripotent stem cells. Cell Death Differ. 2017, 24, 1017-1028.

      (7) Gong, L. p53 coordinates with D133p53 isoform to promote cell survival under low-level oxidative stress. J. Mol. Cell Biol. 2016, 8, 88-90.

      Thank you very much for your comment and for highlighting these important studies. 

      We agree that Δ133p53 isoforms exhibit complex biological functions, with both oncogenic and non-oncogenic potentials. However, our mission here was primarily to reveal the molecular mechanism for the dominant-negative effects exerted by the Δ133p53α and Δ160p53α isoforms on FLp53 for which the Δ133p53α and Δ160p53α isoforms are suitable model systems. Exploring the oncogenic potential of the isoforms is beyond the scope of the current study and we have not claimed anywhere that we are reporting that. We have carefully revised the manuscript and replaced the respective terms e.g. ‘prooncogenic activity’ with ‘dominant-negative effect’ in relevant places (e.g. line 90). We have now also added a paragraph with suitable references that introduces the oncogenic and non-oncogenic roles of the p53 isoforms.

      After reviewing the papers you cited, we are not sure that they reflect on oncogenic /non-oncogenic role of the Δ133p53α isoform in different cancer cases.  Although our study is not about the oncogenic potential of the isoforms, we have summarized the key findings below:

      (1) Hofstetter et al., 2011: Demonstrated that Δ133p53α expression improved recurrence-free and overall survival (in a p53 mutant induced advanced serous ovarian cancer, suggesting a potential protective role in this context.

      (2) Bischof et al., 2019: Found that Δ133p53 mRNA can improve overall survival in high-grade serous ovarian cancers. However, out of 31 patients, only 5 belong to the TP53 wild-type group, while the others carry TP53 mutations.

      (3) Knezović et al., 2019: Reported downregulation of Δ133p53 in renal cell carcinoma tissues with wild-type p53 compared to normal adjacent tissue, indicating a potential non-oncogenic role, but not conclusively demonstrating it.

      (4) Gong et al., 2015: Showed that Δ133p53 antagonizes p53-mediated apoptosis and promotes DNA double-strand break repair by upregulating RAD51, LIG4, and RAD52 independently of FLp53.

      (5) Gong et al., 2016: Demonstrated that overexpression of Δ133p53 promotes efficiency of cell reprogramming by its anti-apoptotic function and promoting DNA DSB repair. The authors hypotheses that this mechanism is involved in increasing RAD51 foci formation and decrease γH2AX foci formation and chromosome aberrations in induced pluripotent stem (iPS) cells, independent of FL p53.

      (6) Horikawa et al., 2017: Indicated that induced pluripotent stem cells derived from fibroblasts that overexpress Δ133p53 formed noncancerous tumors in mice compared to induced pluripotent stem cells derived from fibroblasts with complete p53 inhibition. Thus, Δ133p53 overexpression is "non- or less oncogenic and mutagenic" compared to complete p53 inhibition, but it still compromises certain p53-mediated tumor-suppressing pathways. “Overexpressed Δ133p53 prevented FL-p53 from binding to the regulatory regions of p21WAF1 and miR-34a promoters, providing a mechanistic basis for its dominant-negative

      inhibition of a subset of p53 target genes.”

      (7) Gong, 2016: Suggested that Δ133p53 promotes cell survival under lowlevel oxidative stress, but its role under different stress conditions remains uncertain.

      We have revised the Introduction to provide a more balanced discussion of Δ133p53’s dule role (lines 62-73):

      “The Δ133p53 isoform exhibit complex biological functions, with both oncogenic and non-oncogenic potentials. Recent studies demonstrate the non-oncogenic yet context-dependent role of the Δ133p53 isoform in cancer development. Δ133p53 expression has been reported to correlate with improved survival in patients with TP53 mutations(23, 24), where it promotes cell survival in a nononcogenic manner(25, 26), especially under low oxidative stress(27). Alternatively, other recent evidences emphasize the notable oncogenic functions of Δ133p53 as it can inhibit p53-dependent apoptosis by directly interacting with the FLp53 (4, 6). The oncogenic function of the newly identified Δ160p53 isoform is less known, although it is associated with p53 mutation-driven tumorigenesis(28) and in melanoma cells’ aggressiveness(10). Whether or not the Δ160p53 isoform also impedes FLp53 function in a similar way as Δ133p53 is an open question. However, these p53 isoforms can certainly compromise p53-mediated tumor suppression by interfering with FLp53 binding to target genes such as p21 and miR-34a(2, 29) by dominant-negative effect, the exact mechanism is not known.” On the figures presented in this manuscript, I have three major concerns:

      (1) Most results in the manuscript rely on the overexpression of the FLAGtagged or V5-tagged isoforms. The validation of these construct entirely depends on Supplementary figure 3 which the authors claim "rules out the possibility that the FLAG epitope might contribute to this aggregation. However, I am not entirely convinced by that conclusion. Indeed, the ratio between the "regular" isoform and the aggregates is much higher in the FLAG-tagged constructs than in the V5-tagged constructs. We can visualize the aggregates easily in the FLAG-tagged experiment, but the imaging clearly had to be overexposed (given the white coloring demonstrating saturation of the main bands) to visualize them in the V5-tagged experiments. Therefore, I am not convinced that an effect of the FLAG-tag can be ruled out and more convincing data should be added. 

      Thank you for raising this important concern. We have carefully considered your comments and have made several revisions to clarify and strengthen our conclusions.

      First, to address the potential influence of the FLAG and V5 tags on p53 isoform aggregation, we have revised Figure 2 and removed the previous Supplementary Figure 3, where non-specific antibody bindings and higher molecular weight aggregates were not clearly interpretable. In the revised Figure 2, we have removed these potential aggregates, improving the clarity and accuracy of the data.

      To further rule out any tag-related artifacts, we conducted a coimmunoprecipitation assay with FLAG-tagged FLp53 and untagged Δ133p53 and Δ160p53 isoforms. The results (now shown in the new Supplementary Figure 3) completely agree with our previous result with FLAG-tagged and V5tagged Δ133p53 and Δ160p53 isoforms and show interaction between the partners. This indicates that the FLAG / V5-tags do not influence / interfere with the interaction between FLp53 and the isoforms. We have still used FLAGtagged FLp53 as the endogenous p53 was undetectable and the FLAG-tagged FLp53 did not aggregate alone. 

      In the revised paper, we added the following sentences (Lines 146-152): “To rule out the possibility that the observed interactions between FLp53 and its isoforms Δ133p53 and Δ160p53 were artifacts caused by the FLAG and V5 antibody epitope tags, we co-expressed FLAG-tagged FLp53 with untagged Δ133p53 and Δ160p53. Immunoprecipitation assays demonstrated that FLAGtagged FLp53 could indeed interact with the untagged Δ133p53 and Δ160p53 isoforms (Supplementary Figure 3, lanes 3 and 4), confirming formation of hetero-oligomers between FLp53 and its isoforms. These findings demonstrate that Δ133p53 and Δ160p53 can oligomerize with FLp53 and with each other.”

      Additionally, we performed subcellular fractionation experiments to compare the aggregation and localization of FLAG-tagged FLp53 when co-expressed either with V5-tagged or untagged Δ133p53/Δ160p53. In these experiments, the untagged isoforms also induced FLp53 aggregation, mirroring our previous results with the tagged isoforms (Supplementary Figure 5). We’ve added this result in the revised manuscript (lines 236-245): “To exclude the possibility that FLAG or V5 tags contribute to protein aggregation, we also conducted subcellular fractionation of H1299 cells expressing FLAG-tagged FLp53 along with untagged Δ133p53 or Δ160p53 at a 1:5 ratio. The results showed (Supplementary Figure 6) a similar distribution of FLp53 across cytoplasmic, nuclear, and insoluble nuclear fractions as in the case of tagged Δ133p53 or Δ160p53 (Figure 6A to D). Notably, the aggregation of untagged Δ133p53 or Δ160p53 markedly promoted the aggregation of FLAG-tagged FLp53 (Supplementary Figure 6B and D), demonstrating that the antibody epitope tags themselves do not contribute to protein aggregation.” 

      We’ve also discussed this in the Discussion section (lines 349-356): “In our study, we primarily utilized an overexpression strategy involving FLAG/V5tagged proteins to investigate the effects of p53 isoforms Δ133p53 and Δ160p53 on the function of FLp53. To address concerns regarding potential overexpression artifacts, we performed the co-immunoprecipitation (Supplementary Figure 6) and caspase-3 and -7 activity (Figure 7) experiments with untagged Δ133p53 and Δ160p53. In both experimental systems, the untagged proteins behaved very similarly to the FLAG/V5 antibody epitopecontaining proteins (Figures 6 and 7 and Supplementary Figure 6). Hence, the C-terminal tagging of FLp53 or its isoforms does not alter the biochemical and physiological functions of these proteins.”

      In summary, the revised data set and newly added experiments provide strong evidence that neither the FLAG nor the V5 tag contributes to the observed p53 isoform aggregation.

      (2) The authors demonstrate that to visualize the dominant-negative effect, Δ133p53α and Δ160p53α must be "present in a higher proportion than FLp53 in the tetramer" and the need at least a transfection ratio 1:5 since the 1:1 ration shows no effect. However, in almost every single cell type, FLp53 is far more expressed than the isoforms which make it very unlikely to reach such stoichiometry in physiological conditions and make me wonder if this mechanism naturally occurs at endogenous level. This limitation should be at least discussed.

      Thank you for your insightful comment. However, evidence suggests that the expression levels of these isoforms such as Δ133p53, can be significantly elevated relative to FLp53 in certain physiological conditions(3, 4, 9). For example, in some breast tumors, with Δ133p53 mRNA is expressed at a much levels than FLp53, suggesting a distinct expression profile of p53 isoforms compared to normal breast tissue(4). Similarly, in non-small cell lung cancer and the A549 lung cancer cell line, the expression level of Δ133p53 transcript is significantly elevated compared to non-cancerous cells(3). Moreover, in specific cholangiocarcinoma cell lines, the Δ133p53 /TAp53 expression ratio has been reported to increase to as high as 3:1(9). These observations indicate that the dominant-negative effect of isoform Δ133p53 on FLp53 can occur under certain pathological conditions where the relative amounts of the FLp53 and the isoforms would largely vary. Since data on the Δ160p53 isoform are scarce, we infer that the long N-terminal truncated isoforms may share a similar mechanism.

      (3) Figure 5C: I am concerned by the subcellular location of the Δ133p53α and Δ160p53α as they are commonly considered nuclear and not cytoplasmic as shown here, particularly since they retain the 3 nuclear localization sequences like the FLp53 (Bourdon JC et al. 2005; Mondal A et al. 2018; Horikawa I et al, 2017; Joruiz S. et al, 2024). However, Δ133p53α can form cytoplasmic speckles (Horikawa I et al, 2017) when it colocalizes with autophagy markers for its degradation.

      The authors should discuss this issue. Could this discrepancy be due to the high overexpression level of these isoforms? A co-staining with autophagy markers (p62, LC3B) would rule out (or confirm) activation of autophagy due to the overwhelming expression of the isoform.

      Thank you for your thoughtful comments. We have thoroughly reviewed all the papers you recommended (Bourdon JC et al., 2005; Mondal A et al., 2018; Horikawa I et al., 2017; Joruiz S. et al., 2024)(4, 29, 30, 31). Among these, only the study by Bourdon JC et al. (2005) provided data regarding the localization of Δ133p53(4). Interestingly, their findings align with our observations, indicating that the protein does not exhibit predominantly nuclear localization in the Figure 8 from Jean-Christophe Bourdon et al. Genes Dev. 2005;19:2122-2137. The discrepancy may be caused by a potentially confusing statement in that paper(4).

      The localization of p53 is governed by multiple factors, including its nuclear import and export(32). The isoforms Δ133p53 and Δ160p53 contain three nuclear localization sequences (NLS)(4). However, the isoforms Δ133p53 and Δ160p53 were potentially trapped in the cytoplasm by aggregation and masking the NLS. This mechanism would prevent nuclear import. 

      Further, we acknowledge that Δ133p53 co-aggregates with autophagy substrate p62/SQSTM1 and autophagosome component LC3B in cytoplasm by autophagic degradation during replicative senescence(33). We agree that high overexpression of these aggregation-prone proteins may induce endoplasmic reticulum (ER) stress and activates autophagy(34). This could explain the cytoplasmic localization in our experiments. However, it is also critical to consider that we observed aggregates in both the cytoplasm and the nucleus (Figures 6B and E and Supplementary Figure 6B). While cytoplasmic localization may involve autophagy-related mechanisms, the nuclear aggregates likely arise from intrinsic isoform properties, such as altered protein folding, independent of autophagy. These dual localizations reflect the complex behavior of Δ133p53 and Δ160p53 isoforms under our experimental conditions.

      In the revised manuscript, we discussed this in Discussion (lines 328-335): “Moreover, the observed cytoplasmic isoform aggregates may reflect autophagy-related degradation, as suggested by the co-localization of Δ133p53 with autophagy substrate p62/SQSTM1 and autophagosome component LC3B(33). High overexpression of these aggregation-prone proteins could induce endoplasmic reticulum stress and activate autophagy(34). Interestingly, we also observed nuclear aggregation of these isoforms (Figure 6B and E and Supplementary Figure 6B), suggesting that distinct mechanisms, such as intrinsic properties of the isoforms, may govern their localization and behavior within the nucleus. This dual localization underscores the complexity of Δ133p53 and Δ160p53 behavior in cellular systems.”

      Minor concerns:

      -  Figure 1A: the initiation of the "Δ140p53" is shown instead of "Δ40p53"

      Thank you! The revised Figure 1A has been created in the revised paper.

      -  Figure 2A: I would like to see the images cropped a bit higher, so the cut does not happen just above the aggregate bands

      Thank you for this suggestion. We’ve changed the image and the new Figure 2 has been shown in the revised paper.

      -  Figure 3C: what ratio of FLp53/Delta isoform was used?

      We have added the ratio in the figure legend of Figure 3C (lines 845-846) “Relative DNA-binding of the FLp53-FLAG protein to the p53-target gene promoters in the presence of the V5-tagged protein Δ133p53 or Δ160p53 at a 1: 1 ratio.”

      -  Figure 3C suggests that the "dominant-negative" effect is mostly senescencespecific as it does not affect apoptosis target genes, which is consistent with Horikawa et al, 2017 and Gong et al, 2016 cited above. Furthermore, since these two references and the others from Gong et al. show that Δ133p53α increases DNA repair genes, it would be interesting to look at RAD51, RAD52 or Lig4, and maybe also induce stress.

      Thank you for your thoughtful comments and suggestions. In Figure 3C, the presence of Δ133p53 or Δ160p53 only significantly reduced the binding of FLp53 to the p21 promoter. However, isoforms Δ133p53 and Δ160p53 demonstrated a significant loss of DNA-binding activity at all four promoters: p21, MDM2, and apoptosis target genes BAX and PUMA (Figure 3B). This result suggests that Δ133p53 and Δ160p53 have the potential to influence FLp53 function due to their ability to form hetero-oligomers with FLp53 or their intrinsic tendency to aggregate. To further investigate this, we increased the isoform to FLp53 ratio in Figure 4, which demonstrate that the isoforms Δ133p53 and Δ160p53 exert dominant-negative effects on the function of FLp53. 

      These results demonstrate that the isoforms can compromise p53-mediated pathways, consistent with Horikawa et al. (2017), which showed that Δ133p53α overexpression is "non- or less oncogenic and mutagenic" compared to complete p53 inhibition, but still affects specific tumor-suppressing pathways. Furthermore, as noted by Gong et al. (2016), Δ133p53’s anti-apoptotic function under certain conditions is independent of FLp53 and unrelated to its dominantnegative effects.

      We appreciate your suggestion to investigate DNA repair genes such as RAD51, RAD52, or Lig4, especially under stress conditions. While these targets are intriguing and relevant, we believe that our current investigation of p53 targets in this manuscript sufficiently supports our conclusions regarding the dominant-negative effect. Further exploration of additional p53 target genes, including those involved in DNA repair, will be an important focus of our future studies.

      - Figure 5A and B: directly comparing the level of FLp53 expressed in cytoplasm or nucleus to the level of Δ133p53α and Δ160p53α expressed in cytoplasm or nucleus does not mean much since these are overexpressed proteins and therefore depend on the level of expression. The authors should rather compare the ratio of cytoplasmic/nuclear FLp53 to the ratio of cytoplasmic/nuclear Δ133p53α and Δ160p53α.

      Thank you very much for this valuable suggestion. In the revised paper, Figure 5B has been recreated.  Changes have been made in lines 214215: “The cytoplasm-to-nucleus ratio of Δ133p53 and Δ160p53 was approximately 1.5-fold higher than that of FLp53 (Figure 5B).” 

      Referees cross-commenting

      I agree that the system needs to be improved to be more physiological.

      Just to precise, the D133 and D160 isoforms are not truncated mutants, they are naturally occurring isoforms expressed in almost every normal human cell type from an internal promoter within the TP53 gene.

      Using overexpression always raises concerns, but in this case, I am even more careful because the isoforms are almost always less expressed than the FLp53, and here they have to push it 5 to 10 times more expressed than the FLp53 to see the effect which make me fear an artifact effect due to the overwhelming overexpression (which even seems to change the normal localization of the protein).

      To visualize the endogenous proteins, they will have to change cell line as the H1299 they used are p53 null.

      Thank you for these comments. We’ve addressed the motivation of overexpression in the above responses. We needed to use the plasmid constructs in the p53-null cells to detect the proteins but the expression level was certainly not ‘overwhelmingly high’. 

      First, we tried the A549 cells (p53 wild-type) under DNA damage conditions, but the endogenous p53 protein was undetectable. Second, several studies reported increased Δ133p53 level compared to wild-type p53 and that it has implications in tumor development(2, 3, 4, 9). Third, the apoptosis activity of H1299 cells overexpressing p53 proteins was analyzed in the revised manuscript (Figure 7). The apoptotic activity induced by FLp53 expression was approximately 2.5 times higher than that of the control vector under identical plasmid DNA transfection conditions (Figure 7). These results rule out the possibility that the plasmid-based expression of p53 and its isoforms introduced artifacts in the results. We’ve discussed this in the Results section (lines 254269).

      Reviewer #3 (Significance):

      Overall, the paper is interesting particularly considering the range of techniques used which is the main strength.

      The main limitation to me is the lack of contradictory discussion as all argumentation presents Δ133p53α and Δ160p53α exclusively as oncogenic and strictly FLp53 dominant-negative when, particularly for Δ133p53α, a quite extensive literature suggests a not so clear-cut activity.

      The aggregation mechanism is reported for the first time for Δ133p53α and Δ160p53α, although it was already published for Δ40p53α, Δ133p53β or in mutant p53.

      This manuscript would be a good basic research addition to the p53 field to provide insight in the mechanism for some activities of some p53 isoforms.

      My field of expertise is the p53 isoforms which I have been working on for 11 years in cancer and neuro-degenerative diseases

      Thank you very much for your positive and critical comments. We’ve included a fair discussion on the oncogenic and non-oncogenic function of Δ133p53 in the Introduction following your suggestion (lines 62-73). 

      References

      (1) Pitolli C, Wang Y, Candi E, Shi Y, Melino G, Amelio I. p53-Mediated Tumor Suppression: DNA-Damage Response and Alternative Mechanisms. Cancers 11,  (2019).

      (2) Fujita K, et al. p53 isoforms Delta133p53 and p53beta are endogenous regulators of replicative cellular senescence. Nature cell biology 11, 1135-1142 (2009).

      (3) Fragou A, et al. Increased Δ133p53 mRNA in lung carcinoma corresponds with reduction of p21 expression. Molecular medicine reports 15, 1455-1460 (2017).

      (4) Bourdon JC, et al. p53 isoforms can regulate p53 transcriptional activity. Genes & development 19, 2122-2137 (2005).

      (5) Ghosh A, Stewart D, Matlashewski G. Regulation of human p53 activity and cell localization by alternative splicing. Molecular and cellular biology 24, 7987-7997 (2004).

      (6) Aoubala M, et al. p53 directly transactivates Δ133p53α, regulating cell fate outcome in response to DNA damage. Cell death and differentiation 18, 248-258 (2011).

      (7) Marcel V, et al. p53 regulates the transcription of its Delta133p53 isoform through specific response elements contained within the TP53 P2 internal promoter. Oncogene 29, 2691-2700 (2010).

      (8) Zhao L, Sanyal S. p53 Isoforms as Cancer Biomarkers and Therapeutic Targets. Cancers 14,  (2022).

      (9) Nutthasirikul N, Limpaiboon T, Leelayuwat C, Patrakitkomjorn S, Jearanaikoon P. Ratio disruption of the ∆133p53 and TAp53 isoform equilibrium correlates with poor clinical outcome in intrahepatic cholangiocarcinoma. International journal of oncology 42, 1181-1188 (2013).

      (10) Tadijan A, et al. Altered Expression of Shorter p53 Family Isoforms Can Impact Melanoma Aggressiveness. Cancers 13,  (2021).

      (11) Aubrey BJ, Kelly GL, Janic A, Herold MJ, Strasser A. How does p53 induce apoptosis and how does this relate to p53-mediated tumour suppression? Cell death and differentiation 25, 104-113 (2018).

      (12) Ghorbani N, Yaghubi R, Davoodi J, Pahlavan S. How does caspases regulation play role in cell decisions? apoptosis and beyond. Molecular and cellular biochemistry 479, 1599-1613 (2024).

      (13) Petronilho EC, et al. Oncogenic p53 triggers amyloid aggregation of p63 and p73 liquid droplets. Communications chemistry 7, 207 (2024).

      (14) Forget KJ, Tremblay G, Roucou X. p53 Aggregates penetrate cells and induce the coaggregation of intracellular p53. PloS one 8, e69242 (2013).

      (15) Farmer KM, Ghag G, Puangmalai N, Montalbano M, Bhatt N, Kayed R. P53 aggregation, interactions with tau, and impaired DNA damage response in Alzheimer's disease. Acta neuropathologica communications 8, 132 (2020).

      (16) Arsic N, et al. Δ133p53β isoform pro-invasive activity is regulated through an aggregation-dependent mechanism in cancer cells. Nature communications 12, 5463 (2021).

      (17) Melo Dos Santos N, et al. Loss of the p53 transactivation domain results in high amyloid aggregation of the Δ40p53 isoform in endometrial carcinoma cells. The Journal of biological chemistry 294, 9430-9439 (2019).

      (18) Mestrom L, et al. Artificial Fusion of mCherry Enhances Trehalose Transferase Solubility and Stability. Applied and environmental microbiology 85,  (2019).

      (19) Kaba SA, Nene V, Musoke AJ, Vlak JM, van Oers MM. Fusion to green fluorescent protein improves expression levels of Theileria parva sporozoite surface antigen p67 in insect cells. Parasitology 125, 497-505 (2002).

      (20) Snapp EL, et al. Formation of stacked ER cisternae by low affinity protein interactions. The Journal of cell biology 163, 257-269 (2003).

      (21) Jain RK, Joyce PB, Molinete M, Halban PA, Gorr SU. Oligomerization of green fluorescent protein in the secretory pathway of endocrine cells. The Biochemical journal 360, 645-649 (2001).

      (22) Campbell RE, et al. A monomeric red fluorescent protein. Proceedings of the National Academy of Sciences of the United States of America 99, 7877-7882 (2002).

      (23) Hofstetter G, et al. Δ133p53 is an independent prognostic marker in p53 mutant advanced serous ovarian cancer. British journal of cancer 105, 1593-1599 (2011).

      (24) Bischof K, et al. Influence of p53 Isoform Expression on Survival in High-Grade Serous Ovarian Cancers. Scientific reports 9, 5244 (2019).

      (25) Gong L, et al. p53 isoform Δ113p53/Δ133p53 promotes DNA double-strand break repair to protect cell from death and senescence in response to DNA damage. Cell research 25, 351-369 (2015).

      (26) Gong L, et al. p53 isoform Δ133p53 promotes efficiency of induced pluripotent stem cells and ensures genomic integrity during reprogramming. Scientific reports 6, 37281 (2016).

      (27) Gong L, Pan X, Yuan ZM, Peng J, Chen J. p53 coordinates with Δ133p53 isoform to promote cell survival under low-level oxidative stress. Journal of molecular cell biology 8, 88-90 (2016).

      (28) Candeias MM, Hagiwara M, Matsuda M. Cancer-specific mutations in p53 induce the translation of Δ160p53 promoting tumorigenesis. EMBO reports 17, 1542-1551 (2016).

      (29) Horikawa I, et al. Δ133p53 represses p53-inducible senescence genes and enhances the generation of human induced pluripotent stem cells. Cell death and differentiation 24, 1017-1028 (2017).

      (30) Mondal AM, et al. Δ133p53α, a natural p53 isoform, contributes to conditional reprogramming and long-term proliferation of primary epithelial cells. Cell death & disease 9, 750 (2018).

      (31) Joruiz SM, Von Muhlinen N, Horikawa I, Gilbert MR, Harris CC. Distinct functions of wild-type and R273H mutant Δ133p53α differentially regulate glioblastoma aggressiveness and therapy-induced senescence. Cell death & disease 15, 454 (2024).

      (32) O'Brate A, Giannakakou P. The importance of p53 location: nuclear or cytoplasmic zip code? Drug resistance updates : reviews and commentaries in antimicrobial and anticancer chemotherapy 6, 313-322 (2003).

      (33) Horikawa I, et al. Autophagic degradation of the inhibitory p53 isoform Δ133p53α as a regulatory mechanism for p53-mediated senescence. Nature communications 5, 4706 (2014).

      (34) Lee H, et al. IRE1 plays an essential role in ER stress-mediated aggregation of mutant huntingtin via the inhibition of autophagy flux. Human molecular genetics 21, 101-114 (2012).

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      1) The authors should better review what we know of fungal Drosophila microbiota species as well as the ecology of rotting fruit. Are the microbiota species described in this article specific to their location/setting? It would have been interesting to know if similar species can be retrieved in other locations using other decaying fruits. The term 'core' in the title suggests that these species are generally found associated with Drosophila but this is not demonstrated. The paper is written in a way that implies the microbiota members they have found are universal. What is the evidence for this? Have the fungal species described in this paper been found in other studies? Even if this is not the case, the paper is interesting, but there should be a discussion of how generalizable the findings are.

      The reviewer inquires as to whether the microbial species described in this article are ubiquitously associated with Drosophila or not. Indeed, most of the microbes described in this manuscript are generally recognized as species associated with Drosophila spp. For example, species such as Hanseniaspora uvarum, Pichia kluyveri, and Starmerella bacillaris have been detected in or isolated from Drosophila spp. collected in European countries as well as the United States and Oceania (Chandler et al., 2012; Solomon et al., 2019). As for the bacteria, species belonging to the genera Pantoea, Lactobacillus, Leuconostoc, and Acetobacter have also previously been detected in wild Drosophila spp. (Chandler et al., 2011). These elucidations will be incorporated into our revised manuscript.

      Nevertheless, the term “core” in the manuscript title may lead to misunderstanding, as the generality does not ensure the ubiquitous presence of these microbial species in every individual fly. Considering this point, we will replace the term with an expression more appropriate to our context.

      2) Can the authors clearly demonstrate that the microbiota species that develop in the banana trap are derived from flies? Are these species found in flies in the wild? Did the authors check that the flies belong to the D. melanogaster species and not to the sister group D. simulans?

      Can the authors clearly demonstrate that the microbiota species that develop in the banana trap are derived from flies? Are these species found in flies in the wild?

      The reviewer asked whether the microbial species identified in the fermented banana samples were derived from flies. To address this question, additional experiments under more controlled conditions, such as the inoculation of specific species of wild flies onto fresh bananas, would be needed. Nevertheless, the microbes may potentially originate from wild flies, as supported by the literature cited in our response to the Weakness 1).

      Alternative sources for microbial provenance also merit consideration. For example, microbial entities may be inherently present in unfermented bananas through the infiltration of peel injuries (lines 1141-1142 of the original manuscript). In addition, they could be introduced by insects other than flies, given that both rove beetles (Staphylinidae) and sap beetles (Nitidulidae) were observed in some of the traps. These possibilities will be incorporated into the 'MATERIALS AND METHODS' and 'DISCUSSION' sections of our revised manuscript.

      Did the authors check that the flies belong to the D. melanogaster species and not to the sister group D. simulans?

      Our sampling strategy was designed to target not only D. melanogaster but also other domestic Drosophila species, such as D. simulans, that inhabit human residential areas. After adult flies were caught in each trap, we identified the species as shown in Table S1, thereby showing the presence of either or both D. melanogaster and D. simulans. We will provide these descriptions in MATERIALS AND METHODS and DISCUSSION.

      3) Did the microarrays highlight a change in immune genes (ex. antibacterial peptide genes)? Whatever the answer, this would be worth mentioning. The authors described their microarray data in terms of fed/starved in relation to the Finke article. They should clarify if they observed significant differences between species (differences between species within bacteria or fungi, and more generally differences between bacteria versus fungi).

      Did the microarrays highlight a change in immune genes (ex. antibacterial peptide genes)? Whatever the answer, this would be worth mentioning.

      Regarding the antimicrobial peptide genes, statistical comparisons of our RNA-seq data across different conditions were impracticable because most of them showed low expression levels (refer to Author response table 1, which exhibits the RNA-seq data of the yeast-fed larvae; similar expression profiles were observed in the bacteria-fed larvae). While a subset of genes exhibited significantly elevated expression in the non-supportive conditions relative to the supportive ones, this can be due to intra-sample variability rather than due to distinct nutritional environments. Therefore, it would be difficult to discuss a change in immune genes in the paper. Additionally, the previous study that conducted larval microarray analysis (Zinke et al., 2002) did not explicitly focus on immune genes.

      Author response table 1.

      Antimicrobial peptide genes are not up-regulated by any of the microbes. Antimicrobial peptides gene expression profiles of whole bodies of first-instar larvae fed on yeasts. TPM values of all samples and comparison results of gene expression levels in the larvae fed on supportive and non-supportive yeasts are shown. Antibacterial peptide genes mentioned in Hanson and Lemaitre, 2020 are listed. NA or na, not available.

      They should clarify if they observed significant differences between species (differences between species within bacteria or fungi, and more generally differences between bacteria versus fungi).

      We did not observe significant differences between species within bacteria or fungi, or between bacteria and fungi. For example, the gene expression profiles of larvae fed on the various supporting microbes showed striking similarities to each other, as evidenced by the heat map showing the expression of all genes detected in larvae fed either yeast or bacteria (Author response image 1). Similarities were also observed among larvae fed on distinct non-supporting microbes.

      Author response image 1.

      Gene expression profiles of larvae fed on the various supporting microbes show striking similarities to each other. Heat map showing the gene expression of the first-instar larvae that fed on yeasts or bacteria. Freshly hatched germ-free larvae were placed on banana agar inoculated with each microbe and collected after 15 h feeding to examine gene expression of the whole body. Note that data presented in Figures 3A and 4C in the original manuscript, which are obtained independently, are combined to generate this heat map. The labels under the heat map indicate the microbial species fed to the larvae, with three samples analyzed for each condition. The lactic acid bacteria (“LAB”) include Lactiplantibacillus plantarum and Leuconostoc mesenteroides, while the lactic acid bacterium (“AAB”) represents Acetobacter orientalis. “LAB + AAB” signifies mixtures of the AAB and either one of the LAB species. The asterisk in the label highlights a sample in a “LAB” condition (Leuconostoc mesenteroides), which clustered separately from the other “LAB” samples. Brown abbreviations of scientific names are for the yeast-fed conditions. H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; M. asi, Martiniozyma asiatica; S. cra, Saccharomycopsis crataegensis; P. klu, Pichia kluyveri; S. bac, Starmerella bacillaris; S. cer, S. cerevisiae BY4741 strain.

      Only a handful of genes showed different expression patterns between larvae fed on yeast and those fed on bacteria, without any enrichment for specialized gene functions. Thus, it is challenging to discuss the potential differential impacts, if any, of yeast and bacteria on larval growth.

      4) The whole paper - and this is one of its merits - points to a role of the Drosophila larval microbiota in processing the fly food. Are these bacterial and fungal species found in the gut of larvae/adults? Are these species capable of establishing a niche in the cardia of adults as shown recently in the Ludington lab (Dodge et al.,)? Previous studies have suggested that microbiota members stimulate the Imd pathway leading to an increase in digestive proteases (Erkosar/Leulier). Are the microbiota species studied here affecting gut signaling pathways beyond providing branched amino acids?

      The whole paper - and this is one of its merits - points to a role of the Drosophila larval microbiota in processing the fly food. Are these bacterial and fungal species found in the gut of larvae/adults? Are these species capable of establishing a niche in the cardia of adults as shown recently in the Ludington lab (Dodge et al.,)?

      Although we did not investigate the microbiota in the gut of either larvae or adults, we did compare the microbiota within surface-sterilized larvae or adults with those in food samples. We found that adult flies and early-stage food sources, as well as larvae and late-stage food sources, harbor similar microbial species (Figure 1F). Additionally, previous examinations of the gut microbiota in wild adult flies have identified microbial species or taxa congruent with those we isolated from our foods (Chandler et al., 2011; Chandler et al., 2012). We have elaborated on this in our response to Weakness 1).

      While we did not investigate whether these species are capable of establishing a niche in the cardia of adults, we will cite the study by Dodge et al., 2023 in our revised manuscript and discuss the possibility that predominant microbes in adult flies may show a propensity for colonization.

      Previous studies have suggested that microbiota members stimulate the Imd pathway leading to an increase in digestive proteases (Erkosar/Leulier). Are the microbiota species studied here affecting gut signaling pathways beyond providing branched amino acids?

      The reviewer inquires whether the supportive microbes in our study stimulate gut Imd signaling pathways and induce the expression of digestive protease genes, as demonstrated in a previous study (Erkosar et al., 2015). According to our RNA-seq data, it seems unlikely that the supportive microbes stimulate the signaling pathway. Figures contained in Author response image 2 provide the statistical comparisons of expression levels for seven protease genes between the supportive and the non-supportive conditions. These genes did not exhibit a consistent upregulation in the presence of the supportive microbes (H. uva or K. hum in Author response image 2A; Le mes + A. ori in Author response image 2B). Rather, they exhibited a tendency to be upregulated under the non-supportive microbes (St. bac or Pi. klu in Author response image 2A; La. pla in Author response image 2B).

      Author response image 2.

      Most of the peptidase genes reported by Erkosar et al., 2015 are more highly expressed under the non-supportive conditions than the supportive conditions. Comparison of the expression levels of seven peptidase genes derived from the RNA-seq analysis of yeast-fed (A) or bacteria-fed (B) first-instar larvae. A previous report demonstrated that the expression of these genes is upregulated upon association with a strain of Lactiplantibacillus plantarum, and that the PGRP-LE/Imd/Relish signaling pathway, at least partially, mediates the induction (Erkosar et al., 2015). H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; P. klu, Pichia kluyveri; S. bac, Starmerella bacillaris; La. pla, Lactiplantibacillus plantarum; Le. mes, Leuconostoc mesenteroides; A. ori, Acetobacter orientalis; ns, not significant.

      Reviewer #2 (Public Review):

      Weaknesses:

      The experimental setting that, the authors think, reflects host-microbe interactions in nature is one of the key points. However, it is not explicitly mentioned whether isolated microbes are indeed colonized in wild larvae of Drosophila melanogaster who eat bananas. Another matter is that this work is rather descriptive and a few mechanical insights are presented. The evidence that the nutritional role of BCAAs is incomplete, and molecular level explanation is missing in "interspecies interactions" between lactic acid bacteria (or yeast) and acetic acid bacteria that assure their inhabitation. Apart from these matters, the future directions or significance of this work could be discussed more in the manuscript.

      The experimental setting that, the authors think, reflects host-microbe interactions in nature is one of the key points. However, it is not explicitly mentioned whether isolated microbes are indeed colonized in wild larvae of Drosophila melanogaster who eat bananas.

      The reviewer asks whether the isolated microbes were colonized in the larval gut. Previous studies on microbial colonization associated with Drosophila have predominantly focused on adults (Pais et al. PLOS Biology, 2018), rather than larval stages. Developing larvae continually consume substrates which are already subjected to microbial fermentation and abundant in live microbes until the end of the feeding larval stage. Therefore, we consider it difficult to discuss microbial colonization in the larval gut. We will add this point in the DISCUSSION of the revised manuscript.

      Another matter is that this work is rather descriptive and a few mechanical insights are presented. The evidence that the nutritional role of BCAAs is incomplete, and molecular level explanation is missing in "interspecies interactions" between lactic acid bacteria (or yeast) and acetic acid bacteria that assure their inhabitation.

      While recognizing the importance of comprehensive mechanistic analysis, this study includes all experimentally feasible data. Elucidation of more detailed molecular mechanisms lies beyond the scope of this study and will be the subject of future research.

      Regarding the nutritional role of BCAAs, the incorporation of BCAAs enabled larvae fed with the non-supportive yeast to grow to the second instar. This observation suggests that consumption of BCAAs upregulates diverse genes involved in cellular growth processes in larvae. We have discussed the hypothetical interaction between lactic acid bacteria (LAB) and acetic acid bacteria (AAB) in the manuscript (lines 402-405): LAB may facilitate lactate provision to AAB, consequently enhancing the biosynthesis of essential nutrients such as amino acids. To test this hypothesis, future experiments will include the supplementation of lactic acid to AAB culture plates and the co-inoculating LAB mutant strains defective in lactate production with AABs, to assess both larval growth and continuous larval association with AABs. With respect to AAB-yeast interactions, metabolites released from yeast cells might benefit AAB growth, and this possibility will be investigated through the supplementation of AAB culture plates with candidate metabolites identified in the cell suspension supernatants of the late-stage yeasts.

      Apart from these matters, the future directions or significance of this work could be discussed more in the manuscript.

      We appreciate the reviewer's recommendations and will include additional descriptions regarding these aspects in the DISCUSSION section.

      Reviewer #3 (Public Review):

      Weaknesses:

      Despite describing important findings, I believe that a more thorough explanation of the experimental setup and the steps expected to occur in the exposed diet over time, starting with natural "inoculation" could help the reader, in particular the non-specialist, grasp the rationale and main findings of the manuscript. When exactly was the decision to collect early-stage samples made? Was it when embryos were detected in some of the samples? What are the implications of bacterial presence in the no-fly traps? These samples also harbored complex microbial communities, as revealed by sequencing. Were these samples colonized by microbes deposited with air currents? Were they the result of flies that touched the material but did not lay eggs? Could the traps have been visited by other insects? Another interesting observation that could be better discussed is the fact that adult flies showed a microbiome that more closely resembles that of the early-stage diet, whereas larvae have a more late-stage-like microbiome. It is easy to understand why the microbiome of the larvae would resemble that of the late-stage foods, but what about the adult microbiome? Authors should discuss or at least acknowledge the fact that there must be a microbiome shift once adults leave their food source. Lastly, the authors should provide more details about the metabolomics experiments. For instance, how were peaks assigned to leucine/isoleucine (as well as other compounds)? Were both retention times and MS2 spectra always used? Were standard curves produced? Were internal, deuterated controls used?

      When exactly was the decision to collect early-stage samples made? Was it when embryos were detected in some of the samples?

      We collected traps and early-stage samples 2.5 days after setting up the traps. This time frame was determined by pilot experiments. A shorter collection time resulted in a greater likelihood of obtaining no-fly traps, whereas a longer collection time caused larval overcrowding, as well as adults’ deaths from drowning in the liquid seeping out of fruits. These procedural details will be delineated in the MATERIALS AND METHODS section of the revised manuscript.

      What are the implications of bacterial presence in the no-fly traps? These samples also harbored complex microbial communities, as revealed by sequencing. Were these samples colonized by microbes deposited with air currents? Were they the result of flies that touched the material but did not lay eggs? Could the traps have been visited by other insects?

      We assume that the origins of the microbes detected in the no-fly trap foods vary depending on the species. For instance, Colletotrichum musae, the fungus that causes banana anthracnose, may have been present in fresh bananas before trap placement. The filamentous fungi could have originated from airborne spores, but they could also have been introduced by insects that feed on these fungi. We will include these possibilities in the DISCUSSION section of the revised manuscript.

      Another interesting observation that could be better discussed is the fact that adult flies showed a microbiome that more closely resembles that of the early-stage diet, whereas larvae have a more late-stage-like microbiome. It is easy to understand why the microbiome of the larvae would resemble that of the late-stage foods, but what about the adult microbiome? Authors should discuss or at least acknowledge the fact that there must be a microbiome shift once adults leave their food source.

      We are grateful for the reviewer's insightful suggestions regarding shifts in the adult microbiome. We plan to include in the DISCUSSION section of the revised manuscript the possibility that the microbial composition may change substantially during pupal stages and that microbes obtained after eclosion could potentially form the adult gut microbiota.

      Lastly, the authors should provide more details about the metabolomics experiments. For instance, how were peaks assigned to leucine/isoleucine (as well as other compounds)? Were both retention times and MS2 spectra always used? Were standard curves produced? Were internal, deuterated controls used?

      We appreciate the reviewer's advice. Detailed methods of the metabolomic experiments will be included in our revised manuscript.

    1. Author Response

      We would like to thank the editors and reviewers for their thoughtful comments on our manuscript. Before we can provide a point-by-point response and submit a revised version of the manuscript we would like to provisionally address and alleviate some of their main concerns.

      A concern was expressed in the ‘eLife assessment’ and by two of the reviewers that a potential confound between the coding of sensory information and behavior outcome by IC neurons might have been introduced by combining data across different sound levels, which could challenge the conclusions of the study. In addressing this we have carried out the analysis (i.e. averaging the neural activity separately for different sound levels) suggested for distinguishing between the two alternative explanations offered by reviewer #1: That the difference in neural activity between hit and miss trials reflects a) behavior or b) sound level (more precisely: differences in response magnitude arising from a higher proportion of highsound-level trials in the hit trial group than in the miss trial group). If the data favored b), we would expect no difference in activity between hit and miss trials when plotted separately for different sound levels. The figure in Author response image 1 indicates that that is not the case. Hit and miss trial activity are clearly distinct even when plotted separately for different sound levels, confirming that this difference in activity reflects the animals’ behavior rather than sensory information.

      Author response image 1.

      A related concern was expressed with regards to the decoding analysis. Namely, that differences in the distributions of sound levels in the different trial types could confound the decoding into hit and miss trials and that, consequently, the results of the decoding analysis merely reflect differences in the processing of sound level. Our analysis actually aimed to take this into account but, unfortunately, we failed to include sufficient details in the methods section of the submitted manuscript. Rather than including all the trials in a given session, only trials of intermediate difficulty were used for the decoding analysis. More specifically, we only included trials across five sound levels, comprising the lowest sound level that exceeded a d prime of 1.5 plus the two sound levels below and above that level. That ensured that differences in sound level distributions would be small, while still giving us a sufficient number of trials to perform the decoding analysis. In this context, it is worth bearing in mind that a) the decoding analysis was done on a frame-by-frame basis, meaning that the decoding score achieved early in the trial has no impact on the decoding score at later time points in the trial, b) sound-driven activity can be observed predominantly immediately after stimulus onset and is largely over about 1 s into the trial (see cluster 3, for instance, or average miss trial activity in the plots above), c) decoding performance of the behavioral outcome starts to plateau 5001000 ms into the trial and remains high until it very gradually begins to decline after about 2 s into the trial. In other words, decoding performance remains high far longer than the stimulus would be expected to have an impact on the neurons’ activity. Therefore, we would expect any residual bias due to differences in the sound level distribution that our approach did not control for to be restricted to the very beginning of the trial and not to meaningfully impact the conclusions derived from the decoding analysis.

      Another concern expressed in the reviews is that, in relation to the cluster-wise analysis of neural activity, no direct comparison (beyond the pie charts of Figure 5C) was provided between data from lesioned and non-lesioned groups, leaving unclear how similar taskrelevant activity is between these groups. In Author response image 2 we plot, analogous to Figure 5B, the average hit and miss trial activity for the 10 clusters separately for lesioned and non-lesioned mice, illustrating more clearly the high degree of similarity between the two groups.

      Author response image 2.

    1. Author response:

      Reviewer #1 (Public review):

      (1) Some details are not described for experimental procedures. For example, what were the pharmacological drugs dissolved in, and what vehicle control was used in experiments? How long were pharmacological drugs added to cells?

      We apologise for the oversight. These details have now been added to the methods section of the manuscript as well as to the relevant figure legends.

      Briefly, latrunculin was used at a final concentration of 250 nM and Y27632 at a final concentration of 50 μM. Both drugs were dissolved in DMSO. The vehicle controls were effected with the highest final concentration of DMSO of the two drugs.

      The details of the drug treatments and their duration was added to the methods and to figures 6, S10, and S12.

      (2) Details are missing from the Methods section and Figure captions about the number of biological and technical replicates performed for experiments. Figure 1C states the data are from 12 beads on 7 cells. Are those same 12 beads used in Figure 2C? If so, that information is missing from the Figure 2C caption. Similarly, this information should be provided in every figure caption so the reader can assess the rigor of the experiments. Furthermore, how heterogenous would the bead displacements be across different cells? The low number of beads and cells assessed makes this information difficult to determine.

      We apologise for the oversight. We have now added this data to the relevant figure panels.

      To gain a further understanding of the heterogeneity of bead displacements across cells, we have replotted the relevant graphs using different colours to indicate different cells. This reveals that different cells appear to behave similarly and that the behaviour appears controlled by distance to the indentation or the pipette tip rather than cell identity.

      We agree with the reviewer that the number of cells examined is low. This is due to the challenging nature of the experiments that signifies that many attempts are necessary to obtain a successful measurement.

      The experiments in Fig 1C are a verification of a behaviour documented in a previous publication [1]. Here, we just confirm the same behaviour and therefore we decided that only a small number of cells was needed.

      The experiments in Fig 2C (that allow for a direct estimation of the cytoplasm’s hydraulic permeability) require formation of a tight seal between the glass micropipette and the cell, something known as a gigaseal in electrophysiology. The success rate of this first step is 10-30% of attempts for an experienced experimenter. The second step is forming a whole cell configuration, in which a hydraulic link is formed between the cell and the micropipette. This step has a success rate of ~ 50%. Whole cell links are very sensitive to any disturbance. After reaching the whole cell configuration, we applied relatively high pressures that occasionally resulted in loss of link between the cell and the micropipette. In summary, for the 12 successful measurements, hundreds of unsuccessful attempts were carried out.

      (3) The full equation for displacement vs. time for a poroelastic material is not provided. Scaling laws are shown, but the full equation derived from the stress response of an elastic solid and viscous fluid is not shown or described.

      We thank the reviewer for this comment. Based on our experiments, we found that the cytoplasm behaves as a poroelastic material. However, to understand the displacements of the cell surface in response to localised indentation, we show that we also need to take the tension of the sub membranous cortex into account. In summary, the interplay between cell surface tension generated by the cortex and the poroelastic cytoplasm controls the cell behaviour. To our knowledge, no simple analytical solutions to this type of problem exist.

      In Fig 1, we show that the response of the cell to local indentation is biphasic with a short time-scale displacement followed by a longer time-scale one. In Figs 2 and 3, we directly characterise the kinetics of cell surface displacement in response to microinjection of fluid. These kinetics are consistent with the long time-scale displacement but not the short time-scale one. Scaling considerations led us to propose that tension in the cortex may play a role in mediating the short time-scale displacement. To verify this hypothesis, we have now added new data showing that the length-scale of an indentation created by an AFM probe depends on tension in the cortex (Fig S5).

      In a previous publication [2], we derived the temporal dynamics of cell surface displacement for a homogenous poroelastic material in response to a change in osmolarity. In the current manuscript, the composite nature of the cell (membrane, cortex, cytoplasm) needs to be taken into account as well as a realistic cell shape. Therefore, we did not attempt to provide an analytical solution for the displacement of the cell surface versus time in the current work. Instead, we turned to finite element modelling to show that our observations are qualitatively consistent with a cell that comprises a tensed sub membranous actin cortex and a poroelastic cytoplasm (Fig 4). We have now added text to make this clearer for the reader.

      Reviewer #2 (Public review):

      Comments & Questions:

      The authors state, "Next, we sought to quantitatively understand how the global cellular response to local indentation might arise from cellular poroelasticity." However, the evidence presented in the following paragraph appears more qualitative than strictly quantitative. For instance, the length scale estimate of ~7 μm is only qualitatively consistent with the observed ~10 μm, and the timescale 𝜏𝑧 ≈ 500 ms is similarly described as "qualitatively consistent" with experimental observations. Strengthening this point would benefit from more direct evidence linking the short timescale to cell surface tension. Have you tried perturbing surface tension and examining its impact on this short-timescale relaxation by modulating acto-myosin contractility with Y-27632, depolymerizing actin with Latrunculin, or applying hypo/hyperosmotic shocks?

      Upon rereading our manuscript, we agree with the reviewer that some of our statements are too strong. We have now moderated these and clarified the goal of that section of the text.

      The reviewer asks if we have examined the effect of various perturbations on the short time-scale displacements. In our experimental conditions, we cannot precisely measure the time-scale of the fast relaxation because its duration is comparable to the frame rate of our image acquisition. However, we examined the amplitude of the displacement of the first phase in response to sucrose treatment and we have carried out new experiments in which we treat cells with 250nM Latrunculin to partially depolymerise cellular F-actin. Neither of these treatments had an impact on the amplitude of vertical displacements (Author response image 1).

      The absence of change in response to Latrunculin may be because the treatment decreases both the elasticity of the cytoplasm E and the cortical tension γ. As the length-scale l of the deformation of the surface scales as , the two effects of latrunculin treatment may therefore compensate one another and result in only small changes in l. We have now added this data to supplementary information and comment on this in the text.

      Author response image 1:

      Amplitude of the short time-scale displacements of beads in response to AFM indentation at δx=0µm for control cells, sucrose treated cells, and cells treated with Latrunculin B. n indicates the number of cells examined and N the number of beads.

      The reviewer’s comment also made us want to determine how cortical tension affects the length-scale of the cell surface deformation created by localised micro indentation. To isolate the role of the cortex from that of cell shape, we decided to examine rounded mitotic cells. In our experiments, we indented a mitotic cell expressing a membrane targeted GFP with a sharp AFM tip (Author response image 2).

      In our experiments, we adjusted force to generate a 2μm depth indentation and we imaged the cell profile with confocal microscopy before and during indentation. Segmentation of this data allowed us to determine the cell surface displacement resulting from indentation and measure a length scale of deformation. In control conditions, the length scale created by deformation is on the order of 1.2μm. When we inhibited myosin contractility with blebbistatin, the length-scale of deformation decreased significantly to 0.8 μm, as expected if we decrease the surface tension γ without affecting the cytoplasmic elasticity. We have now added this data to our manuscript.

      Author response image 2.

      (a) Overlay of the zx profiles of a mitotic cell before (green) and during indentation (red). The cell membrane is labelled with CellMask DeepRed. The arrowhead indicates the position of the AFM tip. Scale bar 10µm. (b) Position of the membrane along the top half of the cell before (green) and during (red) indentation. The membrane position is derived from segmentation of the data in (a). Deformation is highly localised and membrane profiles overlap at the edges. The tip position is marked by an *. (c) The difference in membrane height between pre-indentation and indentation profiles plotted in (b) with the tip located at x=0. (d) Schematic of the cell surface profile during indentation and the corresponding length scale of the deformation induced by indentation. (e) Measured length scale for an indentation ~2µm for DMSO control l=1.2±0.2µm (n=8 cells) and with blebbistatin treatment (100µM) l=0.8±0.4µm (n=9 cells) (p= 0.016

      The authors demonstrate that the second relaxation timescale increases (Figure 1, Panel D) following a hyperosmotic shock, consistent with cytoplasmic matrix shrinkage, increased friction, and consequently a longer relaxation timescale. While this result aligns with expectations, is a seven-fold increase in the relaxation timescale realistic based on quantitative estimates given the extent of volume loss?

      We thank the reviewer for this interesting question. Upon re-examining our data, we realised that the numerical values in the text related to the average rather than the median of our measurements. The median of the poroelastic time constant increases from ~0.4s in control conditions to 1.4s in sucrose, representing approximately a 3.5-fold increase.

      Previous work showed that HeLa cell volume decreases by ~40% in response to hyperosmotic shock [3]. The fluid volume fraction in cells is ~65-75%. If we assume that the water is contained in N pores of volume , we can express the cell volume as with V<sub>s</sub> the volume of the solid fraction. We can rewrite with ϕ = 0.42 -0.6. As V<sub>s</sub> does not change in response to osmotic shock, we can rewrite the volume change to obtain the change in pore size .

      The poroelastic diffusion constant scales as and the poroelastic timescale scales as . Therefore, the measured change in volume leads to a predicted increase in poroelastic diffusion time of 1.7-1.9-fold, smaller than observed in our experiments. This suggests that some intuition can be gained in a straightforward manner assuming that the cytoplasm is a homogenous porous material.

      However, the reality is more complex and the hydraulic pore size is distinct from the entanglement length of the cytoskeleton mesh, as we discussed in a previous publication [4]. When the fluid fraction becomes sufficiently small, macromolecular crowding will impact diffusion further and non-linearities will arise. We have now added some of these considerations to the discussion.

      If the authors' hypothesis is correct, an essential physiological parameter for the cytoplasm could be the permeability k and how it is modulated by perturbations, such as volume loss or gain. Have you explored whether the data supports the expected square dependency of permeability on hydraulic pore size, as predicted by simple homogeneity assumptions?

      We thank the reviewer for this comment. As discussed above, we have explored such considerations in a previous publication (see discussion in [4]). Briefly, we find that the entanglement length of the F-actin cytoskeleton does play a role in controlling the hydraulic pore size but is distinct from it. Membrane bounded organelles could also contribute to setting the pore size. In our previous publication, we derived a scaling relationship that indicates that four different length-scales contribute to setting cellular rheology: the average filament bundle length, the size distribution of particles in the cytosol, the entanglement length of the cytoskeleton, and the hydraulic pore size. Many of these length-scales can be dynamically controlled by the cell, which gives rise to complex rheology. We have now added these considerations to our discussion.

      Additionally, do you think that the observed decrease in k in mitotic cells compared to interphase cells is significant? I would have expected the opposite naively as mitotic cells tend to swell by 10-20 percent due to the mitotic overshoot at mitotic entry (see Son Journal of Cell Biology 2015 or Zlotek Journal of Cell Biology 2015).

      We thank the reviewer for this interesting question. Based on the same scaling arguments as above, we would expect that a 10-20% increase in cell volume would give rise to 10-20% increase in diffusion constant. However, we also note that metaphase leads to a dramatic reorganisation of the cell interior and in particular membrane-bounded organelles. In summary, we do not know why such a decrease could take place. We now highlight this as an interesting question for further research.

      Based on your results, can you estimate the pore size of the poroelastic cytoplasmic matrix? Is this estimate realistic? I wonder whether this pore size might define a threshold above which the diffusion of freely diffusing species is significantly reduced. Is your estimate consistent with nanobead diffusion experiments reported in the literature? Do you have any insights into the polymer structures that define this pore size? For example, have you investigated whether depolymerizing actin or other cytoskeletal components significantly alters the relaxation timescale?

      We thank the reviewer for this comment. We cannot directly estimate the hydraulic pore size from the measurements performed in the manuscript. Indeed, while we understand the general scaling laws, the pre-factors of such relationships are unknown.

      We carried out experiments aiming at estimating the hydraulic pore size in previous publications [3,4] and others have shown spatial heterogeneity of the cytoplasmic pore size [5]. In our previous experiments, we examined the diffusion of PEGylated quantum dots (14nm in hydrodynamic radius). In isosmotic conditions, these diffused freely through the cell but when the cell volume was decreased by a hyperosmotic shock, they no longer moved [3,4]. This gave an estimate of the pore radius of ~15nm.

      Previous work has suggested that F-actin plays a role in dictating this pore size but microtubules and intermediate filaments do not [4].

      There are no quantifications in Figure 6, nor is there a direct comparison with the model. Based on your model, would you expect the velocity of bleb growth to vary depending on the distance of the bleb from the pipette due to the local depressurization? Specifically, do blebs closer to the pipette grow more slowly?

      We apologise for the oversight. The quantifications are presented in Fig S10 and Fig S12. We have now modified the figure legends accordingly.

      Blebs are very heterogenous in size and growth velocity within a cell and across cells in the population in normal conditions [6]. Other work has shown that bleb size is controlled by a competition between pressure driving growth and actin polymerisation arresting it[7]. Therefore, we did not attempt to determine the impact of depressurisation on bleb growth velocity or size.

      In experiments in which we suddenly increased pressure in blebbing cells, we did notice a change in the rate of growth of blebs that occurred after we increased pressure (Author response image 3). However, the experiments are technically challenging and we decided not to perform more.

      Author response image 3:

      A. A hydraulic link is established between a blebbing cell and a pipette. At time t>0, a step increase in pressure is applied. B. Kymograph of bleb growth in a control cell (top) an in a cell subjected to a pressure increase at t=0s (bottom). Top: In control blebs, the rate of growth is slow and approximately constant over time. The black arrow shows the start of blebbing. Bottom: The black arrow shows the start of blebbing. The dashed line shows the timing of pressure application and the red arrow shows the increase in growth rate of the bleb when the pressure increase reaches the bleb. This occurs with a delay δt.

      I find it interesting that during depressurization of the interphase cells, there is no observed volume change, whereas in pressurization of metaphase cells, there is a volume increase. I assume this might be a matter of timescale, as the microinjection experiments occur on short timescales, not allowing sufficient time for water to escape the cell. Do you observe the radius of the metaphase cells decreasing later on? This relaxation could potentially be used to characterize the permeability of the cell surface.

      We thank the reviewer for this comment.

      First, we would like to clarify that both metaphase and interphase cells increase their volume in response to microinjection. The effect is easier to quantify in metaphase cells because we assume spherical symmetry and just monitor the evolution of the radius (Fig 3). However, the displacement of the beads in interphase cells (Fig 2) clearly shows that the cell volume increases in response to microinjection. For both interphase and metaphase cells, when the injection is prolonged, the membrane eventually detaches from the cortex and large blebs form until cell lysis. In contrast to the reviewer’s intuition, we never observe a relaxation in cell volume, probably because we inject fluid faster than the cell can compensate volume change through regulatory mechanisms involving ion channels.

      When we depressurise metaphase cells, we do not observe any change in volume (Fig S10). This contrasts with the increase that we observe upon pressurisation. The main difference between these two experiments is the pressure differential. During depressurisation experiments, this is the hydraulic pressure within the cell ~500Pa (Fig 6A); whereas during pressurisation experiments, this is the pressure in the micropipette, ranging from 1.4-10 kPa (Fig 3). We note in particular that, when we used the lowest pressures in our experiments, the increase in volume was very slow (see Fig 3C). Therefore, we agree with the reviewer that it is likely the magnitude of the pressure differential that explains these differences.

      I am curious about the saturation of the time lag at 30 microns from the pipette in Figure 4, Panel E for the model's prediction. A saturation which is not clearly observed in the experimental data. Could you comment on the origin of this saturation and the observed discrepancy with the experiments (Figure E panel 2)? Naively, I would have expected the time lag to scale quadratically with the distance from the pipette, as predicted by a poroelastic model and the diffusion of displacement. It seems weird to me that the beads start to move together at some distance from the pipette or else I would expect that they just stop moving. What model parameters influence this saturation? Does membrane permeability contribute to this saturation?

      We thank the reviewer for pointing this out. In our opinion, the saturation occurring at 30 microns arises from the geometry of the model. At the largest distance away from the micropipette, the cortex becomes dominant in the mechanical response of the cell because it represents an increasing proportion of the cellular material.

      To test this hypothesis, we will rerun our finite element models with a range of cell sizes. This will be added to the manuscript at a later date.

      Reviewer #3 (Public review):

      Weaknesses: I have two broad critical comments:

      (1) I sense that the authors are correct that the best explanation of their results is the passive poroelastic model. Yet, to be thorough, they have to try to explain the experiments with other models and show why their explanation is parsimonious. For example, one potential explanation could be some mechanosensitive mechanism that does not involve cytoplasmic flow; another could be viscoelastic cytoskeletal mesh, again not involving poroelasticity. I can imagine more possibilities. Basically, be more thorough in the critical evaluation of your results. Besides, discuss the potential effect of significant heterogeneity of the cell.

      We thank the reviewer for these comments and we agree with their general premise.

      Some observations could qualitatively be explained in other ways. For example, if we considered the cell as a viscoelastic material, we could define a time constant with η the viscosity and E the elasticity of the material. The increase in relaxation time with sucrose treatment could then be explained by an increase in viscosity. However, work by others has previously shown that, in the exact same conditions as our experiment, viscoelasticity cannot account for the observations[1]. In its discussion, this study proposed poroelasticity as an alternative mechanism but did not investigate that possibility. This was consistent with our work that showed that the cytoplasm behaves as a poroelastic material and not as a viscoelastic material [4]. Therefore, we decided not to consider viscoelasticity as possibility. We now explain this reasoning better and have added a sentence about a potential role for mechanotransductory processes in the discussion.

      (2) The study is rich in biophysics but a bit light on chemical/genetic perturbations. It could be good to use low levels of chemical inhibitors for, for example, Arp2/3, PI3K, myosin etc, and see the effect and try to interpret it. Another interesting question - how adhesive strength affects the results. A different interesting avenue - one can perturb aquaporins. Etc. At least one perturbation experiment would be good.

      We agree with the reviewer. In our previous studies, we already examined what biological structures affect the poroelastic properties of cells [2,4]. Therefore, the most interesting aspect to examine in our current work would be perturbations to the phenomenon described in Fig 6G and, in particular, to investigate what volume regulation mechanisms enable sustained intracellular pressure gradients. However, these experiments are particularly challenging and with very low throughput. Therefore, we feel that these are out of the scope of the present report and we mention these as promising future directions.

      References:

      (1) Rosenbluth, M. J., Crow, A., Shaevitz, J. W. & Fletcher, D. A. Slow stress propagation in adherent cells. Biophys J 95, 6052-6059 (2008). https://doi.org/10.1529/biophysj.108.139139

      (2) Esteki, M. H. et al. Poroelastic osmoregulation of living cell volume. iScience 24, 103482 (2021). https://doi.org/10.1016/j.isci.2021.103482

      (3) Charras, G. T., Mitchison, T. J. & Mahadevan, L. Animal cell hydraulics. J Cell Sci 122, 3233-3241 (2009). https://doi.org/10.1242/jcs.049262

      (4) Moeendarbary, E. et al. The cytoplasm of living cells behaves as a poroelastic material. Nat Mater 12, 253-261 (2013). https://doi.org/10.1038/nmat3517

      (5) Luby-Phelps, K., Castle, P. E., Taylor, D. L. & Lanni, F. Hindered diffusion of inert tracer particles in the cytoplasm of mouse 3T3 cells. Proc Natl Acad Sci U S A 84, 4910-4913 (1987). https://doi.org/10.1073/pnas.84.14.4910

      (6) Charras, G. T., Coughlin, M., Mitchison, T. J. & Mahadevan, L. Life and times of a cellular bleb. Biophys J 94, 1836-1853 (2008). https://doi.org/10.1529/biophysj.107.113605

      (7) Tinevez, J. Y. et al. Role of cortical tension in bleb growth. Proc Natl Acad Sci U S A 106, 18581-18586 (2009). https://doi.org/10.1073/pnas.0903353106

    1. Author Response

      eLife assessment

      This potentially valuable study uses classic neuroanatomical techniques and synchrotron X-ray tomography to investigate the mapping of the trunk within the brainstem nuclei of the elephant brain. Given its unique specializations, understanding the somatosensory projections from the elephant trunk would be of general interest to evolutionary neurobiologists, comparative neuroscientists, and animal behavior scientists. However, the anatomical analysis is inadequate to support the authors' conclusion that they have identified the elephant trigeminal sensory nuclei rather than a different brain region, specifically the inferior olive.

      Comment: We are happy that our paper is considered to be potentially valuable. Also, the editors highlight the potential interest of our work for evolutionary neurobiologists, comparative neuroscientists, and animal behavior scientists. The editors are more negative when it comes to our evidence on the identification of the trigeminal nucleus vs the inferior olive. We have five comments on this assessment. (i) We think this assessment is heavily biased by the comments of referee 2. We will show that the referee’s comments are more about us than about our paper. Hence, the referee failed to do their job (refereeing our paper) and should not have succeeded in leveling our paper. (ii) We have no ad hoc knock-out experiments to distinguish the trigeminal nucleus vs the inferior olive. Such experiments (extracellular recording & electrolytic lesions, viral tracing would be done in a week in mice, but they cannot and should not be done in elephants. (iii) We have extraordinary evidence. Nobody has ever described a similarly astonishing match of body (trunk folds) and myeloarchitecture in the trigeminal system before. (iv) We will show that our assignment of the trigeminal nucleus vs the inferior olive is more plausible than the current hypothesis about the assignment of the trigeminal nucleus vs the inferior olive as defended by referee 2. We think this is why it is important to publish our paper. (v) We think eLife is the perfect place for our publication because the deviating views of referee 2 are published along.

      Change: We performed additional peripherin-antibody staining to differentiate the inferior olive and trigeminal nucleus. Peripherin is a cytoskeletal protein that is found in peripheral nerves and climbing fibers. Specifically, climbing fibers of various species (mouse, rabbit, pig, cow, and human; Errante et al., 1998) are stained intensely with peripherin-antibodies. What is tricky for our purposes is that there is also some peripherin-antibody reactivity in the trigeminal nuclei (Errante et al., 1998). Such peripherin-antibody reactivity is weaker, however, and lacks the distinct axonal bundle signature that stems from the strong climbing fiber peripherin-reactivity as seen in the inferior olive (Errante et al., 1998). As can be seen in Author response image 1, we observe peripherin-reactivity in axonal bundles (i.e. in putative climbing fibers), in what we think is the inferior olive. We also observe weak peripherin-reactivity, in what we think is the trigeminal nucleus, but not the distinct and strong labeling of axonal bundles. These observations are in line with our ideas but are difficult to reconcile with the views of the referee. Specifically, the lack of peripherin-reactive axon bundles suggests that there are no climbing fibres in what the referee thinks is the inferior olive.

      Errante, L., Tang, D., Gardon, M., Sekerkova, G., Mugnaini, E., & Shaw, G. (1998). The intermediate filament protein peripherin is a marker for cerebellar climbing fibres. Journal of neurocytology, 27, 69-84.

      Author response image 1.

      The putative inferior olive but not the putative trigeminal nucleus contains peripherin-positive axon bundles (presumptive climbing fibers). (A) Overview picture of a brainstem section stained with anti-peripherin-antibodies (white color). Anti-peripherin-antibodies stain climbing fibers in a wide variety of mammals. The section comes from the posterior brainstem of African elephant cow Bibi; in this posterior region, both putative inferior olive and trigeminal nucleus are visible. Note the bright staining of the dorsolateral nucleus, the putative inferior olive according to Reveyaz et al., and the trigeminal nucleus according to Maseko et al., 2013. (B) High magnification view of the dorsolateral nucleus (corresponding to the upper red rectangle in A). Anti-peripherin-positive axon bundles (putative climbing fibers) are seen in support of the inferior olive hypothesis of Reveyaz et al. (C) High magnification view of the ventromedial nucleus (corresponding to the lower red rectangle in A). The ventromedial nucleus is weakly positive for peripherin but contains no anti-peripherin-positive axon bundles (i.e. no putative climbing fibers) in support of the trigeminal nucleus hypothesis of Reveyaz et al. Note that myelin stripes – weakly visible as dark omissions – are clearly anti-peripherin-negative.

      Reviewer #1:

      Summary:

      This fundamental study provides compelling neuroanatomical evidence underscoring the sensory function of the trunk in African and Asian elephants. Whereas myelinated tracts are classically appreciated as mediating neuronal connections, the authors speculate that myelinated bundles provide functional separation of trunk folds and display elaboration related to the "finger" projections. The authors avail themselves of many classical neuroanatomical techniques (including cytochrome oxidase stains, Golgi stains, and myelin stains) along with modern synchrotron X-ray tomography. This work will be of interest to evolutionary neurobiologists, comparative neuroscientists, and the general public, with its fascinating exploration of the brainstem of an icon sensory specialist.

      Comment: We are incredibly grateful for this positive assessment.

      Changes: None.

      Strengths:

      • The authors made excellent use of the precious sample materials from 9 captive elephants.

      • The authors adopt a battery of neuroanatomical techniques to comprehensively characterize the structure of the trigeminal subnuclei and properly re-examine the "inferior olive".

      • Based on their exceptional histological preparation, the authors reveal broadly segregated patterns of metabolic activity, similar to the classical "barrel" organization related to rodent whiskers.

      Comment: The referee provides a concise summary of our findings.

      Changes: None.

      Weaknesses:

      • As the authors acknowledge, somewhat limited functional description can be provided using histological analysis (compared to more invasive techniques).

      • The correlation between myelinated stripes and trunk fold patterns is intriguing, and Figure 4 presents this idea beautifully. I wonder - is the number of stripes consistent with the number of trunk folds? Does this hold for both species?

      Comment: We agree with the referee’s assessment. We note that cytochrome-oxidase staining is an at least partially functional stain, as it reveals constitutive metabolic activity. A significant problem of the work in elephants is that our recording possibilities are limited, which in turn limits functional analysis. As indicated in Figure 4 for the African elephant Indra, there was an excellent match of trunk folds and myelin stripes. Asian elephants have more, and less conspicuous trunk folds than African elephants. As illustrated in Figure 6, Asian elephants have more, and less conspicuous myelin stripes. Thus, species differences in myelin stripes correlate with species differences in trunk folds.

      Changes: We clarify the relation of myelin stripe and trunk fold patterns in our discussion of Figure 6.  

      Reviewer #2 (Public Review):

      The authors describe what they assert to be a very unusual trigeminal nuclear complex in the brainstem of elephants, and based on this, follow with many speculations about how the trigeminal nuclear complex, as identified by them, might be organized in terms of the sensory capacity of the elephant trunk.

      Comment: We agree with the referee’s assessment that the putative trigeminal nucleus described in our paper is highly unusual in size, position, vascularization, and myeloarchitecture. This is why we wrote this paper. We think these unusual features reflect the unique facial specializations of elephants, i.e. their highly derived trunk. Because we have no access to recordings from the elephant brainstem, we cannot back up all our functional interpretations with electrophysiological evidence; it is therefore fair to call them speculative.

      Changes: None.

      The identification of the trigeminal nuclear complex/inferior olivary nuclear complex in the elephant brainstem is the central pillar of this manuscript from which everything else follows, and if this is incorrect, then the entire manuscript fails, and all the associated speculations become completely unsupported.

      Comment: We agree.

      Changes: None.

      The authors note that what they identify as the trigeminal nuclear complex has been identified as the inferior olivary nuclear complex by other authors, citing Shoshani et al. (2006; 10.1016/j.brainresbull.2006.03.016) and Maseko et al (2013; 10.1159/000352004), but fail to cite either Verhaart and Kramer (1958; PMID 13841799) or Verhaart (1962; 10.1515/9783112519882-001). These four studies are in agreement, but the current study differs.

      Comment & Change: We were not aware of the papers of Verhaart and included them in the revised ms.

      Let's assume for the moment that the four previous studies are all incorrect and the current study is correct. This would mean that the entire architecture and organization of the elephant brainstem is significantly rearranged in comparison to ALL other mammals, including humans, previously studied (e.g. Kappers et al. 1965, The Comparative Anatomy of the Nervous System of Vertebrates, Including Man, Volume 1 pp. 668-695) and the closely related manatee (10.1002/ar.20573). This rearrangement necessitates that the trigeminal nuclei would have had to "migrate" and shorten rostrocaudally, specifically and only, from the lateral aspect of the brainstem where these nuclei extend from the pons through to the cervical spinal cord (e.g. the Paxinos and Watson rat brain atlases), the to the spatially restricted ventromedial region of specifically and only the rostral medulla oblongata. According to the current paper, the inferior olivary complex of the elephant is very small and located lateral to their trigeminal nuclear complex, and the region from where the trigeminal nuclei are located by others appears to be just "lateral nuclei" with no suggestion of what might be there instead.

      Comment: We have three comments here:

      1) The referee correctly notes that we argue the elephant brainstem underwent fairly major rearrangements. In particular, we argue that the elephant inferior olive was displaced laterally, by a very large cell mass, which we argue is an unusually large trigeminal nucleus. To our knowledge, such a large compact cell mass is not seen in the ventral brain stem of any other mammal.

      2) The referee makes it sound as if it is our private idea that the elephant brainstem underwent major rearrangements and that the rest of the evidence points to a conventional ‘rodent-like’ architecture. This is far from the truth, however. Already from the outside appearance (see our Figure 1B and Figure 6A) it is clear that the elephant brainstem has huge ventral bumps not seen in any other mammal. An extraordinary architecture also holds at the organizational level of nuclei. Specifically, the facial nucleus – the most carefully investigated nucleus in the elephant brainstem – has an appearance distinct from that of the facial nuclei of all other mammals (Maseko et al., 2013; Kaufmann et al., 2022). If both the overall shape and the constituting nuclei of the brainstem are very different from other mammals, it is very unlikely if not impossible that the elephant brainstem follows in all regards a conventional ‘rodent-like’ architecture.

      3) The inferior olive is an impressive nucleus in the partitioning scheme we propose (Author response image 1). In fact – together with the putative trigeminal nucleus we describe – it’s the most distinctive nucleus in the elephant brainstem. We have not done volumetric measurements and cell counts here, but think this is an important direction for future work. What has informed our work is that the inferior olive nucleus we describe has the serrated organization seen in the inferior olive of all mammals. We will discuss these matters in depth below.

      Changes: None.

      Such an extraordinary rearrangement of brainstem nuclei would require a major transformation in the manner in which the mutations, patterning, and expression of genes and associated molecules during development occur. Such a major change is likely to lead to lethal phenotypes, making such a transformation extremely unlikely. Variations in mammalian brainstem anatomy are most commonly associated with quantitative changes rather than qualitative changes (10.1016/B978-0-12-804042-3.00045-2).

      Comment: We have two comments here:

      1) The referee claims that it is impossible that the elephant brainstem differs from a conventional brainstem architecture because this would lead to lethal phenotypes etc. Following our previous response, this argument does not hold. It is out of the question that the elephant brainstem looks very different from the brainstem of other mammals. Yet, it is also evident that elephants live. The debate we need to have is not if the elephant brainstem differs from other mammals, but how it differs from other mammals.

      2). In principle we agree with the referee’s thinking that the model of the elephant brainstem that is most likely correct is the one that requires the least amount of rearrangements to other mammals. We therefore prepared a comparison of the model the referee is proposing (Maseko et al., 2013; see Author response table 1 below) with our proposition. We scored these models on their similarity to other mammals. We find that the referee’s ideas (Maseko et al., 2013) require more rearrangements relative to other mammals than our suggestion.

      Changes: Inclusion of Author response table 1, which we discuss in depth below.

      The impetus for the identification of the unusual brainstem trigeminal nuclei in the current study rests upon a previous study from the same laboratory (10.1016/j.cub.2021.12.051) that estimated that the number of axons contained in the infraorbital branch of the trigeminal nerve that innervate the sensory surfaces of the trunk is approximately 400 000. Is this number unusual? In a much smaller mammal with a highly specialized trigeminal system, the platypus, the number of axons innervating the sensory surface of the platypus bill skin comes to 1 344 000 (10.1159/000113185). Yet, there is no complex rearrangement of the brainstem trigeminal nuclei in the brain of the developing or adult platypus (Ashwell, 2013, Neurobiology of Monotremes), despite the brainstem trigeminal nuclei being very large in the platypus (10.1159/000067195). Even in other large-brained mammals, such as large whales that do not have a trunk, the number of axons in the trigeminal nerve ranges between 400,000 and 500,000 (10.1007/978-3-319-47829-6_988-1). The lack of comparative support for the argument forwarded in the previous and current study from this laboratory, and that the comparative data indicates that the brainstem nuclei do not change in the manner suggested in the elephant, argues against the identification of the trigeminal nuclei as outlined in the current study. Moreover, the comparative studies undermine the prior claim of the authors, informing the current study, that "the elephant trigeminal ganglion ... point to a high degree of tactile specialization in elephants" (10.1016/j.cub.2021.12.051). While clearly, the elephant has tactile sensitivity in the trunk, it is questionable as to whether what has been observed in elephants is indeed "truly extraordinary".

      Comment: These comments made us think that the referee is not talking about the paper we submitted, but that the referee is talking about us and our work in general. Specifically, the referee refers to the platypus and other animals dismissing our earlier work, which argued for a high degree of tactile specialization in elephants. We think the referee’s intuitions are wrong and our earlier work is valid.

      Changes: We prepared a Author response image 2 (below) that puts the platypus brain, a monkey brain, and the elephant trigeminal ganglion (which contains a large part of the trunk innervating cells) in perspective.

      Author response image 2.

      The elephant trigeminal ganglion is comparatively large. Platypus brain, monkey brain, and elephant ganglion. The elephant has two trigeminal ganglia, which contain the first-order somatosensory neurons. They serve mainly for tactile processing and are large compared to a platypus brain (from the comparative brain collection) and are similar in size to a monkey brain. The idea that elephants might be highly specialized for trunk touch is also supported by the analysis of the sensory nerves of these animals (Purkart et al., 2022). Specifically, we find that the infraorbital nerve (which innervates the trunk) is much thicker than the optic nerve (which mediates vision) and the vestibulocochlear nerve (which mediates hearing). Thus, not everything is large about elephants; instead, the data argue that these animals are heavily specialized for trunk touch.

      But let's look more specifically at the justification outlined in the current study to support their identification of the unusually located trigeminal sensory nuclei of the brainstem.

      (1) Intense cytochrome oxidase reactivity.

      (2) Large size of the putative trunk module.

      (3) Elongation of the putative trunk module.

      (4) The arrangement of these putative modules corresponds to elephant head anatomy.

      (5) Myelin stripes within the putative trunk module that apparently match trunk folds.

      (6) Location apparently matches other mammals.

      (7) Repetitive modular organization apparently similar to other mammals.

      (8) The inferior olive described by other authors lacks the lamellated appearance of this structure in other mammals.

      Comment: We agree those are key issues.

      Changes: None.

      Let's examine these justifications more closely.

      (1) Cytochrome oxidase histochemistry is typically used as an indicative marker of neuronal energy metabolism. The authors indicate, based on the "truly extraordinary" somatosensory capacities of the elephant trunk, that any nuclei processing this tactile information should be highly metabolically active, and thus should react intensely when stained for cytochrome oxidase. We are told in the methods section that the protocols used are described by Purkart et al (2022) and Kaufmann et al (2022). In neither of these cited papers is there any description, nor mention, of the cytochrome oxidase histochemistry methodology, thus we have no idea of how this histochemical staining was done. To obtain the best results for cytochrome oxidase histochemistry, the tissue is either processed very rapidly after buffer perfusion to remove blood or in recently perfusion-fixed tissue (e.g., 10.1016/0165-0270(93)90122-8). Given: (1) the presumably long post-mortem interval between death and fixation - "it often takes days to dissect elephants"; (2) subsequent fixation of the brains in 4% paraformaldehyde for "several weeks"; (3) The intense cytochrome oxidase reactivity in the inferior olivary complex of the laboratory rat (Gonzalez-Lima, 1998, Cytochrome oxidase in neuronal metabolism and Alzheimer's diseases); and (4) The lack of any comparative images from other stained portions of the elephant brainstem; it is difficult to support the justification as forwarded by the authors. The histochemical staining observed is likely background reactivity from the use of diaminobenzidine in the staining protocol. Thus, this first justification is unsupported.

      Comment: The referee correctly notes the description of our cytochrome-oxidase reactivity staining was lacking. This is a serious mistake of ours for which we apologize very much. The referee then makes it sound as if we messed up our cytochrome-oxidase staining, which is not the case. All successful (n = 3; please see our technical comments in the recommendation section) cytochrome-oxidase stainings were done with elephants with short post-mortem times (≤ 2 days) to brain removal/cooling and only brief immersion fixation (≤ 1 day). Cytochrome-oxidase reactivity in elephant brains appears to be more sensitive to quenching by fixation than is the case for rodent brains. We think it is a good idea to include a cytochrome-oxidase staining overview picture because we understood from the referee’s comments that we need to compare our partitioning scheme of the brainstem with that of other authors. To this end, we add a cytochrome-oxidase staining overview picture (Author response image 3) along with an alternative interpretation from Maseko et al., 2013.

      Changes: 1) We added details on our cytochrome-oxidase reactivity staining protocol and the cytochrome-oxidase reactivity in the elephant brain in general recommendation.

      2) We provide a detailed discussion of the technicalities of cytochrome-oxidase staining below in the recommendation section, where the referee raised further criticisms.

      3) We include a cytochrome-oxidase staining overview picture (Author response image 2) along with an alternative interpretation from Maseko et al., 2013.

      Author response image 3.

      Cytochrome-oxidase staining overview along with the Maseko et al. (2013) scheme Left, coronal cytochrome-oxidase staining overview from African elephant cow Indra; the section is taken a few millimeters posterior to the facial nucleus. Brown is putatively neural cytochrome-reactivity, and white is the background. Black is myelin diffraction and (seen at higher resolution, when you zoom in) erythrocyte cytochrome-reactivity in blood vessels (see our Figure 1E-G); such blood vessel cytochrome-reactivity is seen, because we could not perfuse the animal. There appears to be a minimal outside-in-fixation artifact (i.e. a more whitish/non-brownish appearance of the section toward the borders of the brain). This artifact is not seen in sections from Indra that we processed earlier or in other elephant brains processed at shorter post-mortem/fixation delays (see our Figure 1C). Right, coronal partitioning scheme of Maseko et al. (2013) for the elephant brainstem at an approximately similar anterior-posterior level.

      The same structures can be recognized left and right. The section is taken at an anterior-posterior level, where we encounter the trigeminal nuclei in pretty much all mammals. Note that the neural cytochrome reactivity is very high, in what we refer to as the trigeminal-nuclei-trunk-module and what Maseko et al. refer to as inferior olive. Myelin stripes can be recognized here as white omissions.

      At the same time, the cytochrome-oxidase-reactivity is very low in what Maseko et al. refer to as trigeminal nuclei. The indistinct appearance and low cytochrome-oxidase-reactivity of the trigeminal nuclei in the scheme of Maseko et al. (2013) is unexpected because trigeminal nuclei stain intensely for cytochrome-oxidase-reactivity in most mammals and because the trigeminal nuclei represent the elephant’s most important body part, the trunk. Staining patterns of the trigeminal nuclei as identified by Maseko et al. (2013) are very different at more posterior levels; we will discuss this matter below.

      Justifications (2), (3), and (4) are sequelae from justification (1). In this sense, they do not count as justifications, but rather unsupported extensions.

      Comment: These are key points of our paper that the referee does not discuss.

      Changes: None.

      (4) and (5) These are interesting justifications, as the paper has clear internal contradictions, and (5) is a sequelae of (4). The reader is led to the concept that the myelin tracts divide the nuclei into sub-modules that match the folding of the skin on the elephant trunk. One would then readily presume that these myelin tracts are in the incoming sensory axons from the trigeminal nerve. However, the authors note that this is not the case: "Our observations on trunk module myelin stripes are at odds with this view of myelin. Specifically, myelin stripes show no tapering (which we would expect if axons divert off into the tissue). More than that, there is no correlation between myelin stripe thickness (which presumably correlates with axon numbers) and trigeminal module neuron numbers. Thus, there are numerous myelinated axons, where we observe few or no trigeminal neurons. These observations are incompatible with the idea that myelin stripes form an axonal 'supply' system or that their prime function is to connect neurons. What do myelin stripe axons do, if they do not connect neurons? We suggest that myelin stripes serve to separate rather than connect neurons." So, we are left with the observation that the myelin stripes do not pass afferent trigeminal sensory information from the "truly extraordinary" trunk skin somatic sensory system, and rather function as units that separate neurons - but to what end? It appears that the myelin stripes are more likely to be efferent axonal bundles leaving the nuclei (to form the olivocerebellar tract). This justification is unsupported.

      Comment: The referee cites some of our observations on myelin stripes, which we find unusual. We stand by the observations and comments. The referee does not discuss the most crucial finding we report on myelin stripes, namely that they correspond remarkably well to trunk folds.

      Changes: None.

      (6) The authors indicate that the location of these nuclei matches that of the trigeminal nuclei in other mammals. This is not supported in any way. In ALL other mammals in which the trigeminal nuclei of the brainstem have been reported they are found in the lateral aspect of the brainstem, bordered laterally by the spinal trigeminal tract. This is most readily seen and accessible in the Paxinos and Watson rat brain atlases. The authors indicate that the trigeminal nuclei are medial to the facial nerve nucleus, but in every other species, the trigeminal sensory nuclei are found lateral to the facial nerve nucleus. This is most salient when examining a close relative, the manatee (10.1002/ar.20573), where the location of the inferior olive and the trigeminal nuclei matches that described by Maseko et al (2013) for the African elephant. This justification is not supported.

      Comment: The referee notes that we incorrectly state that the position of the trigeminal nuclei matches that of other mammals. We think this criticism is justified.

      Changes: We prepared a comparison of the Maseko et al. (2013) scheme of the elephant brainstem with our scheme of the elephant brainstem (see Author response table 1). Here we acknowledge the referee’s argument and we also changed the manuscript accordingly.

      (7) The dual to quadruple repetition of rostrocaudal modules within the putative trigeminal nucleus as identified by the authors relies on the fact that in the neurotypical mammal, there are several trigeminal sensory nuclei arranged in a column running from the pons to the cervical spinal cord, these include (nomenclature from Paxinos and Watson in roughly rostral to caudal order) the Pr5VL, Pr5DM, Sp5O, Sp5I, and Sp5C. However, these nuclei are all located far from the midline and lateral to the facial nerve nucleus, unlike what the authors describe in the elephants. These rostrocaudal modules are expanded upon in Figure 2, and it is apparent from what is shown that the authors are attributing other brainstem nuclei to the putative trigeminal nuclei to confirm their conclusion. For example, what they identify as the inferior olive in Figure 2D is likely the lateral reticular nucleus as identified by Maseko et al (2013). This justification is not supported.

      Comment: The referee again compares our findings to the scheme of Maseko et al. (2013) and rejects our conclusions on those grounds. We think such a comparison of our scheme is needed, indeed.

      Changes: We prepared a comparison of the Maseko et al. (2013) scheme of the elephant brainstem with our scheme of the elephant brainstem (see Author response table 1).

      (8) In primates and related species, there is a distinct banded appearance of the inferior olive, but what has been termed the inferior olive in the elephant by other authors does not have this appearance, rather, and specifically, the largest nuclear mass in the region (termed the principal nucleus of the inferior olive by Maseko et al, 2013, but Pr5, the principal trigeminal nucleus in the current paper) overshadows the partial banded appearance of the remaining nuclei in the region (but also drawn by the authors of the current paper). Thus, what is at debate here is whether the principal nucleus of the inferior olive can take on a nuclear shape rather than evince a banded appearance. The authors of this paper use this variance as justification that this cluster of nuclei could not possibly be the inferior olive. Such a "semi-nuclear/banded" arrangement of the inferior olive is seen in, for example, giraffe (10.1016/j.jchemneu.2007.05.003), domestic dog, polar bear, and most specifically the manatee (a close relative of the elephant) (brainmuseum.org; 10.1002/ar.20573). This justification is not supported.

      Comment: We carefully looked at the brain sections referred to by the referee in the brainmuseum.org collection. We found contrary to the referee’s claims that dogs, polar bears, and manatees have a perfectly serrated (a cellular arrangement in curved bands) appearance of the inferior olive. Accordingly, we think the referee is not reporting the comparative evidence fairly and we wonder why this is the case.

      Changes: None.

      Thus, all the justifications forwarded by the authors are unsupported. Based on methodological concerns, prior comparative mammalian neuroanatomy, and prior studies in the elephant and closely related species, the authors fail to support their notion that what was previously termed the inferior olive in the elephant is actually the trigeminal sensory nuclei. Given this failure, the justifications provided above that are sequelae also fail. In this sense, the entire manuscript and all the sequelae are not supported.

      Comment: We disagree. To summarize:

      (1) Our description of the cytochrome oxidase staining lacked methodological detail, which we have now added; the cytochrome oxidase reactivity data are great and support our conclusions.

      (2)–(5)The referee does not really discuss our evidence on these points.

      (6) We were wrong and have now fixed this mistake.

      (7) The referee asks for a comparison to the Maseko et al. (2013) scheme (agreed, see Author response image 4 4 and Author response table 1).

      (8) The referee bends the comparative evidence against us.

      Changes: None.

      A comparison of the elephant brainstem partitioning schemes put forward by Maseko et al 2013 and by Reveyaz et al.

      To start with, we would like to express our admiration for the work of Maseko et al. (2013). These authors did pioneering work on obtaining high-quality histology samples from elephants. Moreover, they made a heroic neuroanatomical effort, in which they assigned 147 brain structures to putative anatomical entities. Most of their data appear to refer to staining in a single elephant and one coronal sectioning plane. The data quality and the illustration of results are excellent.

      We studied mainly two large nuclei in six (now 7) elephants in three (coronal, parasagittal, and horizontal) sectioning planes. The two nuclei in question are the two most distinct nuclei in the elephant brainstem, namely an anterior ventromedial nucleus (the trigeminal trunk module in our terminology; the inferior olive in the terminology of Maseko et al., 2013) and a more posterior lateral nucleus (the inferior olive in our terminology; the posterior part of the trigeminal nuclei in the terminology of Maseko et al., 2013).

      Author response image 4 gives an overview of the two partitioning schemes for inferior olive/trigeminal nuclei along with the rodent organization (see below).

      Author response image 4.

      Overview of the brainstem organization in rodents & elephants according to Maseko et. (2013) and Reveyaz et al. (this paper).

      The strength of the Maseko et al. (2013) scheme is the excellent match of the position of elephant nuclei to the position of nuclei in the rodent (Author response image 4). We think this positional match reflects the fact that Maseko et al. (2013) mapped a rodent partitioning scheme on the elephant brainstem. To us, this is a perfectly reasonable mapping approach. As the referee correctly points out, the positional similarity of both elephant inferior olive and trigeminal nuclei to the rodent strongly argues in favor of the Maseko et al. (2013), because brainstem nuclei are positionally very conservative.

      Other features of the Maseko et al. (2013) scheme are less favorable. The scheme marries two cyto-architectonically very distinct divisions (an anterior indistinct part) and a super-distinct serrated posterior part to be the trigeminal nuclei. We think merging entirely distinct subdivisions into one nucleus is a byproduct of mapping a rodent partitioning scheme on the elephant brainstem. Neither of the two subdivisions resemble the trigeminal nuclei of other mammals. The cytochrome oxidase staining patterns differ markedly across the anterior indistinct part (see our Author response image 4) and the posterior part of the trigeminal nuclei and do not match with the intense cytochrome oxidase reactivity of other mammalian trigeminal nuclei (Referee Figure 3). Our anti-peripherin staining indicates that there probably no climbing fibers, in what Maseko et al. think. is inferior olive; this is a potentially fatal problem for the hypothesis. The posterior part of Maseko et al. (2013) trigeminal nuclei has a distinct serrated appearance that is characteristic of the inferior olive in other mammals. Moreover, the inferior olive of Maseko et al. (2013) lacks the serrated appearance of the inferior olive seen in pretty much all mammals; this is a serious problem.

      The partitioning scheme of Reveyaz et al. comes with poor positional similarity but avoids the other problems of the Maseko et al. (2013) scheme. Our explanation for the positionally deviating location of trigeminal nuclei is that the elephant grew one of the if not the largest trigeminal systems of all mammals. As a result, the trigeminal nuclei grew through the floor of the brainstem. We understand this is a post hoc just-so explanation, but at least it is an explanation.

      The scheme of Reveyaz et al. was derived in an entirely different way from the Maseko model. Specifically, we were convinced that the elephant trigeminal nuclei ought to be very special because of the gigantic trigeminal ganglia (Purkart et al., 2022). Cytochrome-oxidase staining revealed a large distinct nucleus with an elongated shape. Initially, we were freaked out by the position of the nucleus and the fact that it was referred to as inferior olive by other authors. When we found an inferior-olive-like nucleus at a nearby (although at an admittedly unusual) location, we were less worried. We then optimized the visualization of myelin stripes (brightfield imaging etc.) and were able to collect an entire elephant trunk along with the brain (African elephant cow Indra). When we made the one-to-one match of Indra’s trunk folds and myelin stripes (Figure 4) we were certain that we had identified the trunk module of the trigeminal nuclei. We already noted at the outset of our rebuttal that we now consider such certainty a fallacy of overconfidence. In light of the comments of Referee 2, we feel that a further discussion of our ideas is warranted. A strength of the Reveyaz model is that nuclei look like single anatomical entities. The trigeminal nuclei look like trigeminal nuclei of other mammals, the trunk module has a striking resemblance to the trunk and the inferior olive looks like the inferior olive of other mammals.

      We evaluated the fit of the two models in the form of a table (Author response table 1; below). Unsurprisingly, Author response table 1 aligns with our views of elephant brainstem partitioning.

      Author response table 1.

      Qualitative evaluation of elephant brainstem partitioning schemes

      ++ = Very attractive; + = attractive; - = unattractive; -- = very unattractive We scored features that are clear and shared by all mammals – as far as we know them – as very attractive. We scored features that are clear and are not shared by all mammals – as far as we know them – as very unattractive. Attractive features are either less clear or less well-shared features. Unattractive features are either less clear or less clearly not shared features.

      Author response table 1 suggests two conclusions to us. (i) The Reveyaz et al. model has mainly favorable properties. The Maseko et al. (2013) model has mainly unfavorable properties. Hence, the Reveyaz et al. model is more likely to be true. (ii) The outcome is not black and white, i.e., both models have favorable and unfavorable properties. Accordingly, we overstated our case in our initial submission and toned down our claims in the revised manuscript.

      What the authors have not done is to trace the pathway of the large trigeminal nerve in the elephant brainstem, as was done by Maseko et al (2013), which clearly shows the internal pathways of this nerve, from the branch that leads to the fifth mesencephalic nucleus adjacent to the periventricular grey matter, through to the spinal trigeminal tract that extends from the pons to the spinal cord in a manner very similar to all other mammals. Nor have they shown how the supposed trigeminal information reaches the putative trigeminal nuclei in the ventromedial rostral medulla oblongata. These are but two examples of many specific lines of evidence that would be required to support their conclusions. Clearly, tract tracing methods, such as cholera toxin tracing of peripheral nerves cannot be done in elephants, thus the neuroanatomy must be done properly and with attention to detail to support the major changes indicated by the authors.

      Comment: The referee claims that Maseko et al. (2013) showed by ‘tract tracing’ that the structures they refer to trigeminal nuclei receive trigeminal input. This statement is at least slightly misleading. There is nothing of what amounts to proper ‘tract tracing’ in the Maseko et al. (2013) paper, i.e. tracing of tracts with post-mortem tracers. We tried proper post-mortem tracing but failed (no tracer transport) probably as a result of the limitations of our elephant material. What Maseko et al. (2013) actually did is look a bit for putative trigeminal fibers and where they might go. We also used this approach. In our hands, such ‘pseudo tract tracing’ works best in unstained material under bright field illumination, because myelin is very well visualized. In such material, we find: (i) massive fiber tracts descending dorsoventrally roughly from where both Maseko et al. 2013 and we think the trigeminal tract runs. (ii) These fiber tracts run dorsoventrally and approach, what we think is the trigeminal nuclei from lateral.

      Changes: Ad hoc tract tracing see above.

      So what are these "bumps" in the elephant brainstem?

      Four previous authors indicate that these bumps are the inferior olivary nuclear complex. Can this be supported?

      The inferior olivary nuclear complex acts "as a relay station between the spinal cord (n.b. trigeminal input does reach the spinal cord via the spinal trigeminal tract) and the cerebellum, integrating motor and sensory information to provide feedback and training to cerebellar neurons" (https://www.ncbi.nlm.nih.gov/books/NBK542242/). The inferior olivary nuclear complex is located dorsal and medial to the pyramidal tracts (which were not labeled in the current study by the authors but are clearly present in Fig. 1C and 2A) in the ventromedial aspect of the rostral medulla oblongata. This is precisely where previous authors have identified the inferior olivary nuclear complex and what the current authors assign to their putative trigeminal nuclei. The neurons of the inferior olivary nuclei project, via the olivocerebellar tract to the cerebellum to terminate in the climbing fibres of the cerebellar cortex.

      Comment: We agree with the referee that in the Maseko et al. (2013) scheme the inferior olive is exactly where we expect it from pretty much all other mammals. Hence, this is a strong argument in favor of the Maseko et al. (2013) scheme and a strong argument against the partitioning scheme suggested by us.

      Changes: Please see our discussion above.

      Elephants have the largest (relative and absolute) cerebellum of all mammals (10.1002/ar.22425), this cerebellum contains 257 x109 neurons (10.3389/fnana.2014.00046; three times more than the entire human brain, 10.3389/neuro.09.031.2009). Each of these neurons appears to be more structurally complex than the homologous neurons in other mammals (10.1159/000345565; 10.1007/s00429-010-0288-3). In the African elephant, the neurons of the inferior olivary nuclear complex are described by Maseko et al (2013) as being both calbindin and calretinin immunoreactive. Climbing fibres in the cerebellar cortex of the African elephant are clearly calretinin immunopositive and also are likely to contain calbindin (10.1159/000345565). Given this, would it be surprising that the inferior olivary nuclear complex of the elephant is enlarged enough to create a very distinct bump in exactly the same place where these nuclei are identified in other mammals?

      Comment: We agree with the referee that it is possible and even expected from other mammals that there is an enlargement of the inferior olive in elephants. Hence, a priori one might expect the ventral brain stem bumps to the inferior olive, this is perfectly reasonable and is what was done by previous authors. The referee also refers to calbindin and calretinin antibody reactivity. Such antibody reactivity is indeed in line with the referee’s ideas and we considered these findings in our Referee Table 1. The problem is, however, that neither calbindin nor calretinin antibody reactivity are highly specific and indeed both nuclei in discussion (trigeminal nuclei and inferior olive) show such reactivity. Unlike the peripherin-antibody staining advanced by us, calbindin nor calretinin antibody reactivity cannot distinguish the two hypotheses debated.

      Changes: Please see our discussion above.

      What about the myelin stripes? These are most likely to be the origin of the olivocerebellar tract and probably only have a coincidental relationship with the trunk. Thus, given what we know, the inferior olivary nuclear complex as described in other studies, and the putative trigeminal nuclear complex as described in the current study, is the elephant inferior olivary nuclear complex. It is not what the authors believe it to be, and they do not provide any evidence that discounts the previous studies. The authors are quite simply put, wrong. All the speculations that flow from this major neuroanatomical error are therefore science fiction rather than useful additions to the scientific literature.

      Comment: It is unlikely that the myelin stripes are the origin of the olivocerebellar tract as suggested by the referee. Specifically, the lack of peripherin-reactivity indicates that these fibers are not climbing fibers (Referee Figure 1). In general, we feel the referee does not want to discuss the myelin stripes and obviously thinks we made up the strange correspondence of myelin stripes and trunk folds.

      Changes: Please see our discussion above.

      What do the authors actually have?

      The authors have interesting data, based on their Golgi staining and analysis, of the inferior olivary nuclear complex in the elephant.

      Comment: The referee reiterates their views.

      Changes: None.

      Reviewer #3 (Public Review):

      Summary:

      The study claims to investigate trunk representations in elephant trigeminal nuclei located in the brainstem. The researchers identified large protrusions visible from the ventral surface of the brainstem, which they examined using a range of histological methods. However, this ventral location is usually where the inferior olivary complex is found, which challenges the author's assertions about the nucleus under analysis. They find that this brainstem nucleus of elephants contains repeating modules, with a focus on the anterior and largest unit which they define as the putative nucleus principalis trunk module of the trigeminal. The nucleus exhibits low neuron density, with glia outnumbering neurons significantly. The study also utilizes synchrotron X-ray phase contrast tomography to suggest that myelin-stripe-axons traverse this module. The analysis maps myelin-rich stripes in several specimens and concludes that based on their number and patterning they likely correspond with trunk folds; however, this conclusion is not well supported if the nucleus has been misidentified.

      Comment: The referee gives a concise summary of our findings. The referee acknowledges the depth of our analysis and also notes our cellular results. The referee – in line with the comments of Referee 2 – also points out that a misidentification of the nucleus under study is potentially fatal for our analysis. We thank the referee for this fair assessment.

      Changes: We feel that we need to alert the reader more broadly to the misidentification concern. We think the critical comments of Referee 2, which will be published along with our manuscript, will go a long way in doing so. We think the eLife publishing format is fantastic in this regard. We will also include pointers to these concerns in the revised manuscript.

      Strengths:

      The strength of this research lies in its comprehensive use of various anatomical methods, including Nissl staining, myelin staining, Golgi staining, cytochrome oxidase labeling, and synchrotron X-ray phase contrast tomography. The inclusion of quantitative data on cell numbers and sizes, dendritic orientation and morphology, and blood vessel density across the nucleus adds a quantitative dimension. Furthermore, the research is commendable for its high-quality and abundant images and figures, effectively illustrating the anatomy under investigation.

      Comment: Again, a very fair and balanced set of comments. We are thankful for these comments.

      Changes: None.

      Weaknesses:

      While the research provides potentially valuable insights if revised to focus on the structure that appears to be the inferior olivary nucleus, there are certain additional weaknesses that warrant further consideration. First, the suggestion that myelin stripes solely serve to separate sensory or motor modules rather than functioning as an "axonal supply system" lacks substantial support due to the absence of information about the neuronal origins and the termination targets of the axons. Postmortem fixed brain tissue limits the ability to trace full axon projections. While the study acknowledges these limitations, it is important to exercise caution in drawing conclusions about the precise role of myelin stripes without a more comprehensive understanding of their neural connections.

      Comment: The referee points out a significant weakness of our study, namely our limited understanding of the origin and targets of the axons constituting the myelin stripes. We are very much aware of this problem and this is also why we directed high-powered methodology like synchrotron X-ray tomograms to elucidate the structure of myelin stripes. Such analysis led to advances, i.e., we now think, what looks like stripes are bundles and we understand the constituting axons tend to transverse the module. Such advances are insufficient, however, to provide a clear picture of myelin stripe connectivity.

      Changes: We think solving the problems raised by the referee will require long-term methodological advances and hence we will not be able to solve these problems in the current revision. Our long-term plans for confronting these issues are the following: (i) Improving our understanding of long-range connectivity by post-mortem tracing and MR-based techniques such as Diffusion-Tensor-Imaging. (ii) Improving our understanding of mid and short-range connectivity by applying even larger synchrotron X-ray tomograms and possible serial EM.

      Second, the quantification presented in the study lacks comparison to other species or other relevant variables within the elephant specimens (i.e., whole brain or brainstem volume). The absence of comparative data for different species limits the ability to fully evaluate the significance of the findings. Comparative analyses could provide a broader context for understanding whether the observed features are unique to elephants or more common across species. This limitation in comparative data hinders a more comprehensive assessment of the implications of the research within the broader field of neuroanatomy. Furthermore, the quantitative comparisons between African and Asian elephant specimens should include some measure of overall brain size as a covariate in the analyses. Addressing these weaknesses would enable a richer interpretation of the study's findings.

      Comment: The referee suggests another series of topics, which include the analysis of brain parts volumes or overall brain size. We agree these are important issues, but we also think such questions are beyond the scope of our study.

      Changes: We hope to publish comparative data on elephant brain size and shape later this year.  

    1. Author Response

      eLife assessment

      This study presents a valuable method to visualize the location of the cell types discovered through single-cell RNA sequencing. The evidence supporting the claims is solid, but the inclusion of a larger number of samples would strengthen the study. It would also be helpful to have the methods explained in more detail. The work will be of interest to those seeking to identify new cell types from scRNA-seq and snRNA-seq data.

      Response: We are surprised about the editor’s assessment of our paper as a “valuable” method. This is the first Drosophila adult spatial transcriptomics paper. Hence, we would at least consider this being an “important” method. Spatial transcriptomics has thus far only been done in embryos, which are easy to process for FISH for many decades. Integration with single-cell data is also new. We are further surprised that this assessment does not mention the identification of subcellular mRNA patterns in adult muscles as an “important” biological finding of this paper. We are not aware that any localized mRNAs in Drosophila muscles were known prior to our study. This shows the advantage of spatial transcriptomics over single-cell techniques.

      The work indeed does not represent a full spatial fly adult atlas – however, a proof of principle study covering both the head and body that we consider at least “important”.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Janssens et al. addressed the challenge of mapping the location of transcriptionally unique cell types identified by single nuclei sequencing (snRNA-seq) data available through the Fly Cell Atlas. They identified 100 transcripts for head samples and 50 transcripts for fly body samples allowing the identification of every unique cell type discovered through the Fly Cell Atlas. To map all of these cell types, the authors divided the fly body into head and body samples and used the Molecular Cartography (Resolve Biosciences) method to visualize these transcripts. This approach allowed them to build spatial tissue atlases of the fly head and body, to identify the location of previously unknown cell types and the subcellular localization of different transcripts. By combining snRNA-seq data from the Fly Cell Atlas with their spatially resolved transcriptomics (SRT) data, they demonstrated an automated cell type annotation strategy to identify unknown clusters and infer their location in the fly body. This manuscript constitutes a proof-of-principle study to map the location of the cells identified by ever-growing single-cell transcriptomic datasets generated by others.

      Strengths:

      The authors used the Molecular Cartography (Resolve Biosciences) method to visualize 100 transcripts for head samples and 50 transcripts for fly body samples in high resolution. This method achieves high resolution by multiplexing a large number of transcript visualization steps and allows the authors to map the location of unique cell types identified by the Fly Cell Atlas.

      Response: We thank the reviewer for their comment, but are surprised that this assessment does not mention the identification of subcellular mRNA patterns in adult muscles as an important biological finding of this paper. This might be due to the visualization problem that this reviewer was facing with a greyscale version of the PDF as mentioned in the comments below. We do not know what caused the technical problem for this reviewer (the PDF figures are in color on the eLife website and on bioRxiv). We are surprised that the eLife discussion session did not resolve this issue.

      Weaknesses:

      Combining single-nuclei sequencing (snRNA-seq) data with spatially resolved transcriptomics (SRT) data is challenging, and the methods used by the authors in this study cannot reliably distinguish between cells, especially in brain regions where the processes of different neurons are clustered, such as in neuropils. This means that a grid that the authors mark as a unique cell may actually be composed of processes from multiple cells.

      Response: The size of the fly is one of the most challenging aspects of performing spatial transcriptomics. The small size of the samples led to detachment from the slides, which we solved by coating the slides with gelatin. While the resolution of Molecular Cartography is high (<200nm), in the brain challenges remain as noted by the reviewer. Drosophila neuronal nuclei are notoriously small and cannot be easily resolved with current techniques. We agree that for a full atlas either expansion microscopy, 3D techniques or even higher resolution will be required.

      Reviewer #2 (Public Review):

      Summary:

      The landmark publication of the "Fly Atlas" in 2022 provided a single cell/nuclear transcriptomic dataset from 15 individually dissected tissues, the entire head, and the body of male and female flies. These data led to the annotation of more than 250 cell types. While certainly a powerful and data-rich approach, a significant step forward relies on mapping these data back to the organism in time and space. The goal of this manuscript is to map 150 transcripts defined by the Fly Atlas by FISH and in doing so, provide, for the first time, a spatial transcriptomic dataset of the adult fly. Using this approach (Molecular Cartography with Resolve Biosciences), the authors, furthermore, distinguish different RNA localizations within a cell type. In addition, they seek to use this approach to define previously unannotated clusters found in the Fly Atlas. As a resource for the community at large interested in the computational aspects of their pipeline, the authors compare the strengths and weaknesses of their approach to others currently being performed in the field.

      Strengths:

      1. The authors use Resolve Biosciences and a novel bioinformatics approach to generate a FISH-based spatial transcriptomics map. To achieve this map, they selected 150 genes (50 body; 100 head) that were highly expressed in the single nuclear RNA sequencing dataset and were used in the 2022 paper to annotate specific cell types; moreover, the authors chose several highly expressed genes characteristic of unannotated cell types. Together, the approach and generated data are important next steps in translating the transcriptomic data to spatial data in the organism.

      Response: We thank the reviewer for this comment but would like to add that the statement that we selected “150 genes (50 body; 100 head) that were highly expressed in the single nuclear RNA sequencing dataset” is not correct. We have chosen genes with widely differing expression levels (log-scale range of 3.95 in body, 5.76 in head). Many of the chosen genes are also transcription factors. In fact, the here introduced method is more sensitive than the single cell atlas: the tinman positive cells were readily located (even non-heart cells were found to express tinman), whereas in the single cell FCA data tinman expression is often not detected in the cardiomyocytes (Tinman is detected in 273 cells in the entire FCA (mean expression of 1.44 UMI in positive cells), and in 71 cells out of 273 cardial cells (26%)).

      Author response image 1.

      Density plots for body (left) and head (right) showing levels of gene expression detected in scRNA-seq (body: Fly Cell Atlas, Li et al. 2022, head: Pech et al. (2023)). Blue: all genes, red: genes used in the spatial study.

      1. Working with Resolve, the authors developed a relatively high throughput approach to analyze the location of transcripts in Drosophila adults. This approach confirmed the identification of particular cell types suggested by the FlyAtlas as well as revealed interesting subcellular locations of the transcripts within the cell/tissue type. In addition, the authors used co-expression of different RNAs to unbiasedly identify "new cell types". This pipeline and data provide a roadmap for additional analyses of other time points, female flies, specific mutants, etc.

      2. The authors show that their approach reveals interesting patterns of mRNA distribution (e.g alpha- and beta-Trypsin in apical and basal regions of gut enterocytes or striped patterns of different sarcomeric proteins in body muscle). These observations are novel and reveal unexpected patterns. Likewise, the authors use their more extensive head database to identify the location of cells in the brain. They report the resolution of 23 clusters suggested by the single-cell sequencing data, given their unsupervised clustering approach. This identification supports the use of spatial cell transcriptomics to characterize cell types (or cell states).

      3. Lastly, the authors compare three different approaches --- their own described in this manuscript, Tangram, and SpaGE - which allow integration of single cell/nuclear RNA-seq data with spatial localization FISH. This was a very helpful section as the authors compared the advantages and disadvantages (including practical issues, like computational time).

      Weaknesses:

      1. Experimental setup. It is not clear how many and, for some of the data, the sex of the flies that were analyzed. It appears that for the body data, only one male was analyzed. For the heads, methods say male and female heads, but nothing is annotated in the figures. As such, it remains unclear how robust these data are, given such a limited sample from one sex. As such, the claims of a spatial atlas of the entire fly body and its head ("a rosetta stone") are overstated. Also, the authors should clearly state in the main text and figure legends the sex, the age, how many flies, and how many replicates contributed to the data presented (not just the methods). What also adds to the confusion is the use of "n" in para 2 of the results. " ... we performed coronal sections at different depths in the head (n=13)..." 13 sections in total from 1 head or sections from 13 heads? Based on the body and what is shown in the figure, one assumes 13 sections from one head. Please clarify.

      Response: While we agree that sex differences present indeed an interesting opportunity to study with spatial transcriptomics, our goal was not to define male/female differences but rather to establish the technology to go into this detail if wanted in the future. In the revised version, we will provide a more detailed description of the sections, including their sex/genotype/age. We would like to point out that we verified the specificity of our FISH method on all the body sections (Figure 2A, TpnC4 & Act88F) and not only on one. Furthermore, we also would like to state that the idea of “a rosetta stone” was mentioned as a future prospect. We will rewrite the discussion to make this more clear.

      1. Probes selected: Information from the methods section should be put into the main text so that it is clear what and why the gene lists were selected. The current main text is confusing. If the authors want others to use their approach, then some testing or, at the very least, some discussion of lower expressed genes should be added. How useful will this approach be if only highly expressed genes can be resolved? In addition, while it is understood that the company has a propriety design algorithm for the probes, the authors should comment on whether the probes for individual genes detect all isoforms or subsets (exons and introns?), given the high level of splicing in tissues such as muscle.

      Response: As stated above, while there is a slight bias to higher expressed genes (as expected for marker genes), we have also used very low expressed genes like tinman (body) or sens (head). This shows that our method is more sensitive than single-cell data, as ALL cardiomyocytes can be identified by tinman expression and not only some are positive, as is the case in the FCA data. In fact, the method can’t resolve too highly expressed genes due to optical crowding of the signal leading to a worse quantification. For this reason, ninaE was removed from the analysis (as mentioned in Spatial transcriptomics allows the localization of cell types in the head and brain and in Methods).

      As mentioned in the Methods, the probes are designed on gene level targeting all isoforms, but favoring principal isoforms (weighted by APPRIS level). The high level of splicing is indeed interesting and we expect that in the future spatial transcriptomics can help to generate more insight in this.

      1. Imaging: it isn't clear from the text whether the repeated rounds of imaging impacted data collection. In many of what appear to be "stitched" images, there are gradients of signal (eg, figure 2F); please comment. Also, since this a new technique, could a before and after comparison of the original images and the segmented images be shown in the supplemental data so that the reader can better appreciate how the authors assessed/chose/thresholded their data? More discussion of the accuracy of spot detection would be helpful.

      Response: Any high-resolution imaging (pixel size = 138 nm) of a large field of view (>1mm) uses a stitching method to combine several individual images to reconstruct a large field of view. This does not generate signal gradients, apart from lower signal at the extreme edges of each of the individual images. The spot detection algorithm was written and used by Resolve Biosciences and benchmarked for human (Hela) and mouse (NIH-3T3) cell lines in Groiss et al. 2021 (Highly resolved spatial transcriptomics for detection of rare events in cells, biorxiv). The specificity of the decoded probes was found to lie between 99.45 and 99.9% here, matching the results we found for TpnC4 and Act88F (99.4 and 99.8%). We will add their analysis to our discussion.

      1. The authors comment on how many RNAs they detected (first paragraph of results). How do these numbers compare to the total mRNA present as detected by single-cell or single-nuclear sequencing?

      Response: The total number of mRNAs detected per spatial transcriptomics experiment is much higher for the body samples compared to single-cell experiments (FCA data). In the head it is slightly lower, but here it is important to note that not all cell types are present in each slice in the head (while they are all present in the head scRNA experiments). A comparison on the cell-type level would be more meaningful, and we will investigate this for the revision.

      Author response image 2.

      Barplots showing total number of mRNA molecules detected in Molecular Cartography (Resolve, spatial spots) and in snRNA-seq data from the Fly Cell Atlas (10x Genomics, UMIs). Individual black dots show individual experiments, counts are only shown for the chosen gene panel for each sample. Bar shows the mean, with error bars representing the standard error.

      1. Using this higher throughput method of spatial transcriptomics, the authors discern different cell types and different localization patterns within a tissue/cell type.

      a. The authors should comment on the resolution provided by this approach, in terms of the detection of populations of mRNAs detected by low throughput methods, for example, in glia, motor neuron axons, and trachea that populate muscle tissue. Are these found in the images? Please show.

      Response: We did not add any markers for trachea in our gene panel, but we do detect sparse spots of repo (glia) and elav/VGlut in the muscle tissues (Gad1/VAChT are hardly detected in the muscle tissue). This is consistent with the glutamatergic nature of motor neurons in Drosophila as described previously (Schuster CM (2006) Glutamatergic synapses of Drosophila neuromuscular junctions: a high-resolution model for the analysis of experience-dependent potentiation. Cell Tissue Res 326: 287–299.)

      Author response image 3.

      Molecular Cartography zoomed in on indirect flight muscle. Segmented nuclei are shown in white (based on DAPI), scalebars represent 100 μm).

      b. The authors show interesting localization patterns in muscle tissue for different sarcomere protein-coding mRNAs, including enrichment of sls in muscle nuclei located near the muscle-tendon attachment sites. As this high throughput approach is newly being applied to the adult fly, it would increase confidence in these data, if the authors would confirm these data using a low throughput FISH technique. For example, do the authors detect such alternating "stripes" ( Act 88F, TpnC4, and Mhc) or enriched localization (sls) using FISH that doesn't rely on the repeated colorization, imaging, decolorization of the probes?

      Response: We thank the reviewer for their interest in the localization patterns in muscle tissue. We could confirm localized mRNA in all the sections analyzed, in flight muscles as well as in leg muscles. We furthermore show that Act 88F, TpnC4 are not detected outside of flight muscle cells (99.4% and 99.8% of the single molecular signal in flight muscles only). Hence, we already show the specificity test in a much more quantitative way compared to traditional FISH, which often includes amplification.

      1. The authors developed an unbiased method to identify "new cell types" which relies on co-expression of different transcripts. Are these new cell types or a cell state? While expression is a helpful first step, without any functional data, the significance of what the authors found is diminished. The authors need to soften their statements.

      Response: The term “new cell types” only appears in the title. We agree that with the current spatial map we cannot be sure to have found “new cell types”, instead we have shown where unannotated clusters from scRNA-seq map, based on gene expression. Therefore, we will tone down the title in the revised version and thank the reviewer for this valuable suggestion.

      Appraisal:

      The authors' goal is to map single cell/nuclear RNAseq data described in the 2022 Fly Atlas paper spatially within an organism to achieve a spatial transcriptomic map of the adult fly; no doubt, this is a critical next step in our use of 'omics approaches. While this manuscript does the hard work of trying to take this next step, including developing and testing a new pipeline for high throughput FISH and its analysis, it falls short, in its present form, in achieving this goal. The authors discuss creating a robust spatial map, based on one male fly. Moreover, they do not reveal principles of mRNA localization, as stated in the abstract; they show us patterns, but nothing about the logic or function of these patterns. This same criticism can be said of the identification of "new cell types, just based on RNA colocalization. In both cases (mRNA subcellular localization or cell type identification), further data in the form of validation with traditional low throughput FISH and genetic manipulations to assess the relation to cell function are required for the authors to make such claims.

      Response: We have indeed used one male fly for the adult male body data. This is mainly due to the cost of the sample processing. We used 12 individuals for the head samples (from 1 individual we acquired 2 sections, a total of 13 sections). We show that the body samples show a high correlation with each other, while the head samples cover multiple depths of the head. Still, even in the head, we find that sections at similar depths show a high similarity to each other in terms of gene-gene co-expression and expression patterns. Although obtaining more sections would be valuable, we don’t believe it to be necessary for the current goals. Additional replicates beyond the ones we already provide would require significant amounts of extra time and budget, while they would produce similar results as we already show. We are therefore reluctant to repeat the effort again.

      The usage of the term “new cell types” is indeed ambiguous and we will tone this down in the revised version. Instead, we meant that unannotated clusters could be mapped to their location. In the text, we further specify that this means that now we only have inferred the location of the nuclei and that for neurons their function/processes are still unknown. As such, our data provides a starting point to identify new cell types since their marker genes and nuclear location are inferred. The next step to identify “new cell types” would indeed be to acquire genetic access to the cell types and characterize them in more detail. This is currently beyond our goals, and therefore we will tone down the title in the revised version and thank the reviewer for this valuable suggestion.

      Discussion of likely impact:

      If revised, these data, and importantly the approach, would impact those working on Drosophila adults as well as those working in other model systems where single cell/nuclear sequencing is being translated to the spatial localization within the organism. The subcellular localization data - for example, the size of transcripts and how that relates to localization or the patterns of sarcomeric protein localization in muscle - are intriguing, and would likely impact our thinking on RNA localization, transport, etc if confirmed. Lastly, the authors compare their computational approaches to those available in the field; this is valuable as this is a rapidly evolving field and such considerations are critical for those wishing to use this type of approach.

      Response: We believe that our manuscript as it stands now is already an “important” paper that will strongly impact the Drosophila community (and beyond the spatial transcriptomics community). As it stands, it provides the groundwork for a full Drosophila adult spatial atlas, similar to how early scRNA-seq datasets provided a framework for the Fly Cell Atlas. In the manuscript we provide both experimental information on how to successfully perform spatial transcriptomics (treating slides for optimal attachment) and the data serves as a benchmark for future experiments to improve upon (similar to how early Drop-seq datasets were compared to later 10x datasets in single-cell transcriptomics). In addition, it also provides proof of principle methods on how to integrate the FCA data with these spatial data and it identifies localized mRNA species in large adult muscle cells, showing the complementarity of spatial techniques with single-cell RNA-seq. To conclude, this is the first spatial adult Drosophila transcriptomics paper, locating 150 mRNA species with easy data access in our user portal (https://spatialfly.aertslab.org/).

    1. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): 

      Summary: 

      Laura Morano and colleagues have performed a screen to identify compounds that interfere with the formation of TopBP1 condensates. TopBP1 plays a crucial role in the DNA damage response, and specifically the activation of ATR. They found that the GSK-3b inhibitor AZD2858 reduced the formation of TopBP1 condensates and activation of ATR and its downstream target CHK1 in colorectal cancer cell lines treated with the clinically relevant irinotecan active metabolite SN-38. This inhibition of TopBP1 condensates by AZD2858 was independent from its effect on GSK-3b enzymatic activity. Mechanistically, they show that AZD2858 thus can interfere with intra-S-phase checkpoint signaling, resulting in enhanced cytostatic and cytotoxic effects of SN-38 (or SN-38+Fluoracil aka FOLFIRI) in vitro in colorectal carcinoma cell lines. 

      Major comments: 

      Overall the work is rigorous and the main conclusions are convincing. However, they only show the effects of their combination treatments on colorectal cancer cell lines. I'm worried that blocking the formation of TopB1 condensates will also be detrimental in non-transformed cells. Furthermore it is somewhat disappointing that it remains unclear how AZD2858 blocks selfassembly of TopBP1 condensates, although I understand that unraveling this would be complex and somewhat out-of-reach for now. 

      We appreciate your feedback and fully recognize the importance of understanding how AZD2858 blocks the assembly of TopBP1 condensates. While we understand your disappointment, addressing this question remains a key focus for us. Keeping in mind that unravelling such a mechanism in vitro or in vivo is rather challenging, we have consulted an expert who has made efforts to predict the potential docking sites of AZD2858 on TopBP1, which may provide valuable insights for future experimental investigations. Using an AlphaFold model (no crystal or cryo-EM structure available) and looking for suitable pockets or cavities in which AZD2858 could bind, the analyses, though requiring cautious interpretation, suggested that AZD2858 may target the BRCT1 and BRCT8 domains (as shown below, two pockets n°1 and 7 with sufficient volume and surrounded by b-sheets structures like other GSK3 inhibitor) of TopBP1.

      However, these are preliminary results that require further exploration and experimental validation to confirm their significance and mechanistic implications.

      Author response image 1.

      Here are some specific points for improvement: 

      (1) The authors conclude that "These data supports [sic] the feasibility of targeting condensates formed in response to DNA damage to improve chemotherapy-based cancer treatments". To support this conclusion the authors need to show that proliferating non-transformed cells (e.g. primary cell cultures or organoids) can tolerate the combination of AZD2858 + SN-38 (or FOLFIRI) better than colorectal cancer cells. 

      We would like to thank the reviewer for this vital suggestion to prove that this combination is effective on tumor cells and not very toxic on healthy cells. We therefore used a healthy colon cell line (CCD841) and tested the efficacy of each treatment alone (FOLFIRI and AZD2858) as well as the combination FOLFIRI+AZD2858. We compared the results obtained in the CCD841 cell line with those obtained in the HCT116 colorectal cancer cell line. The results presented below show not only that each treatment alone is much less effective on CCD841 lines, but also that the combination is not synergistic.

      Author response image 2.

      Page 19 "This suggests that the combination... arrests the cell cycle before mitosis in a DNAPKsc-dependent manner." I find the remark that this arrest would be DNA-PKcs-dependent too speculative. I suppose that the authors base this claim on reference 55 but if they want to support this claim they need to prove this by adding DNA-PKcs inhibitors to their treated cells. 

      Thank you for your thoughtful comment. We agree with the reviewer that claiming the G2/M arrest is DNA-PKcs-dependent without direct experimental evidence is speculative. While we initially based this hypothesis on reference 55, we acknowledge that further experiments, such as the use of DNA-PKcs inhibitors, would be necessary to robustly support this claim.

      Given that this observation was intended as a potential explanation for the G2/M arrest observed at 6 and 12 hours of treatment with AZD2858 + SN-38 (compared to SN-38 alone), and considering that exploring this pathway is not the primary focus of our study, we have decided to remove this hypothesis from both the figure and the text to avoid any ambiguity.

      We appreciate the reviewer’s input and will consider investigating this pathway in future studies.

      (2) When discussing Figure S5B the authors claim that SN-38 + AZD2858 progressively increases the fractions of BrdU positive cells, but this is not supported by statistical analysis.

      The fractions are still very small, so I would like to see statistics on these data. Alternatively, the authors could take out this conclusion. 

      Thank you for your valuable comment. In response, we have conducted a statistical analysis (Mann-Whitney test) on the data, and the results have been added to Figure S5C for the 6-hour time point and Figure S5D for the 12-hour time point, based on three independent biological replicates. We hope this provides the necessary clarification.

      Minor comments: 

      - Page 5 Materials and methods - Cell culture. Last sentence "Add in what medium you cultured them" looks like an internal review remark and should probably be removed? 

      We apologize for this oversight. The medium has now been specified, and the sentence has been removed.

      - The numbers in all the synergy matrices (in white font) are extremely small and virtually unreadable, and visually distracting. I recommend taking these out altogether. 

      We believe that the reduction in figure quality may be due to the PDF compression, which affected the resolution of the figures. We are happy to provide high-resolution versions of the figures separately for clarity. If the issue persists even with the higher resolution, we will consider removing the numbers, as suggested.

      - The legends of the synergy matrices (for example Fig 1D, 4E, 5, 6) are often extremely small, making it difficult to understand them intuitively. Please enlarge them and label them more clearly, and use larger fonts. In the legend of Figure 5D,E a green matrix indicating % live cells is mentioned but I don't see it. Do they mean the grey matrix? 

      We have enlarged the figure legends and will provide high-resolution versions of the figures to ensure all details are clearly readable. Regarding Figure 5D,E: we acknowledge that the color may appear differently (more green or gray) depending on the display or printer settings. To avoid any confusion, we have corrected the legend to specify that the color in question is khaki, rather than green. Moreover, following suggestions of the reviewer #2, these figures have been respectively moved to Figure S6B and S6C.

      - Figure S2. Perhaps I misunderstand the PML body experiment but the authors seem to use PML body formation to support their idea that AZD2858 blocks TopBP1 condensate formation and not just any condensate formation. However, if this is the case they would need a proper positive control, i.e. an additional experimental condition in which they do see PLM bodies. 

      Arsenic is a well-known positive control for experiments involving PML bodies due to its ability to induce specific responses in PML proteins and modify PML nuclear bodies (NBs) structure and function (Jaffray et al., 2023, JCB ; Zhu et al., 1997, PNAS). Thus, we used Arsenic as a positive control and observed a significant increase in PML NBs vs the other conditions (Kruskal-Wallis test) as indicated below. We thus implemented the results in the corresponding figure S2B and text.

      Author response image 3.

      PML condensates were tested after 2 h of incubation. AZD2858 : 100nM ; SN-38 : 300nM ; Arsenic : 6µM. ****: p<0.0001 (Kruskal-Wallis test).

      - The quantification of the flow cytometry data needs to be clarified. I find it strange that in the figures (for example Figure 3A and 3C) representative examples are shown of apparently 3 replicates, and that the percentages shown in these examples are then the given in the text as the overall numbers; for example on page 18 "...BrdU incorporation increased from 16.11% (SN38 alone) to 41.83% (combination)...". This type of description is done in multiple places in the Results section and is confusing. It would be clearer if the authors show proper quantifications (mean +/- sem) of the percentages of (the relevant) gated populations. Besides, I don't think it make a lot of sense to mention in the text the percentages with 2 decimals behind the comma. This suggests a level of precision that does not seem justified in flow cytometry data. Finally, all flow cytometry plots look visually very busy and all the text is crammed in with really small fonts. Cleaning them up and enlarging the fonts of the remaining text/numbers would really improve the readability of the figures. 

      Thank you for your helpful comments. We understand your concern regarding the flow cytometry quantification. Indeed, the percentages presented in the figures are derived from representative replicates, and we acknowledge that this presentation could be confusing. To address this, we have included a table summarizing the data from all replicates to improve readability [Table S2 and S3 in the new version]. Second, we specified in the text that the data are representative biological replicates when needed. Third, we have performed statistical analyses on the three replicates when necessary, as shown in Supplementary Figure S5C-F in the new version. The text has been revised to reflect the correct statistical interpretation.

      Regarding the use of two decimal, we are unable to remove them due to limitations in the software (Kaluza) used for flow cytometry analysis. However, we agree that this level of precision may not be warranted, and we have revised the text where appropriate to reduce confusion.

      - In Figure 5G the authors show that FOLFIRI + AZD2858 are synergistic in two SN-38-resistant cell lines. They conclude that this combination may overcome drug resistance. But tried to figure out the used FOLFIRI concentrations used in these cell lines and they still seem far higher than the SN-38-sensitive HCT116 cell lines, so I would like to see a bit more nuance in their interpretation. I think overcoming drug resistance is an overstatement, and perhaps alleviating would be a better term 

      Thank you for highlighting this important point; we have adjusted the text accordingly.

      - The legend in Table S2 refers to Figure 5A-B; this should be Figure 4A-B. 

      Thank you, this has been corrected and Table S2 is now moved to Table S4 .

      Reviewer #1 (Significance (Required)): 

      The finding that AZD2858 block TOPbp1 condensate formation via a pleiotropic effect of this compound is interesting and convincing. To my best knowledge it's a novel finding which is interesting to the potential target audience mentioned below. Their findings that inhibition of TOPbp1 condensation and ATR signaling via AZD2858 may synergize with FOLFIRI therapy in colorectal cancer cells are still very preliminary, because the effects on non-cancerous cells are not tested. 

      Researchers involved in early cancer drug discovery and cell biologists studying DNA damage responses in cancer cells seem to me typical audience interested and influenced by this paper. 

      I'm a cell biologist studying cell cycle fate decisions, and adaptation of cancer cells & stem cells to (drug-induced) stress. My expertise aligns well with the work presented throughout this paper. 

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): 

      The authors have extended their previous research to develop TOPBP1 as a potential drug target for colorectal cancer by inhibiting its condensation. Utilizing an optogenetic approach, they identified the small molecule AZD2858, which inhibits TOPBP1 condensation and works synergistically with first-line chemotherapy to suppress colorectal cancer cell growth. The authors investigated the mechanism and discovered that disrupting TOPBP1 assembly inhibits the ATR/Chk1 signaling pathway, leading to increased DNA damage and apoptosis, even in drug-resistant colorectal cancer cell lines. Addressing the following concerns would enhance clarity and further in vivo work may improve significance: 

      (1) How does the optogenetic method for inducing condensates compare to the DNA damage induction mechanism? 

      Optogenetics provides a versatile and precise approach for controlling the condensation of scaffold proteins in both space and time. This method enables us to study the role of biomolecular condensates with minute-scale resolution, separating their formation from potentially confounding upstream events, such as DNA damage, and providing valuable insights into their specific function. Importantly, based on our previous publications on TopBP1 or SLX4 optogenetic condensates, we have substantial evidence indicating that light-induced condensates closely mimic those formed in response to DNA damage:

      - Functional similarity: Optogenetic condensates recapitulate endogenous condensates formed upon exposure of the cells of DNA damaging agents, and include most known partner proteins involved in the DNA damage response. It was shown for light induced-TopBP1 and SLX4 condensates (1-3).

      - Dynamic reversibility: Optogenetic condensates and DNA damage induced condensates are both dynamic and reversible. They dissolve within 15 minutes of light deactivation or after removal of the damaging agent (1,3).

      - Chromatin association: Both optogenetic and DNA damage-induced condensates are bound to chromatin or localized at sites of DNA damage (3).

      - Regulation: Both types of condensates are regulated similarly, with their formation triggered by the same signaling pathways. ATR basal activity drives the nucleation of opto-TopBP1 condensates and endogenous TopBP1 structures upon light exposure (1). Likewise, sumoylation modifications regulate the formation of opto-SLX4 condensates and endogenous SLX4 condensates (3).

      - Structurally: Using super-resolution imaging by stimulation-emission-depletion (STED) microscopy, we observed that endogenous SLX4 nanocondensates formed globular clusters that were indistinguishable from recombinant light induced SLX4 condensates (1,3).  

      (1) Frattini C, Promonet A, Alghoul E, Vidal-Eychenie S, Lamarque M, Blanchard MP, et al. TopBP1 assembles nuclear condensates to switch on ATR signaling. Molecular Cell. 18 mars 2021;81(6):1231-1245.e8. 

      (2) Alghoul E, Basbous J, Constantinou A. An optogenetic proximity labeling approach to probe the composition of inducible biomolecular condensates in cultured cells. STAR Protocols. 2021;2(3):100677. 

      (3) Alghoul E, Basbous J, Constantinou A. Compartmentalization of the DNA damage response: Mechanisms and functions. DNA Repair. août 2023;128:103524.

      (2) Why wasn't the initial screen conducted on the HCT116-SN50 resistant cell line? 

      Thank you for raising this important question, which we also considered at the outset of the project. After careful consideration, we decided to use the HCT116 WT cells in order to obtain initial data from an unmodified cell line. It is worth mentioning that HCT116-SN50 cells exhibit slower proliferation compared to WT cells, and they also express an efflux pump capable of pumping out SN38. We were concerned that these factors might interfere with the optogenetic assay, which is why we chose to perform the screen using the WT HCT116 cells.

      (3) The labels in Fig. 1D are difficult to recognize. 

      This issue was also raised by Reviewer #1. We suspect that the PDF conversion may have reduced the resolution of the figures, so we will provide them separately in high resolution. In addition, we have increased the size of some labels to improve their clarity.

      The selected cell image in Fig. 2A for SN-38 seems over-representative; unselected cells appear similar to other groups. Why does AZD2858 itself induce TopBP1 condensates in the plot, yet this is not evident in the images? 

      Thank you for your comment; we have updated the figure with a more representative image. We indeed observe that AZD2858 alone induces a slight increase in TopBP1 condensates. However, this increase did not lead to the activation of the ATR/Chk1 signaling pathway, as shown by the Western blot data presented in Fig. 2B. In addition, AZD2858 specifically prevents the formation of TopBP1 condensates induced by SN38 treatment, and the level of TopBP1 condensates does not return to the basal levels observed in untreated cells, but rather to those observed with AZD2858 treatment. During the 2-hour AZD2858 treatment, the progression of replication forks was unaffected (Fig. 3A and 3B). However, when AZD2858 was added alone to the Xenopus egg extracts, there was increased recruitment of TopBP1 to the chromatin (Fig. 2E). This result suggests that AZD2858 alone can induce the assembly of TopBP1 on chromatin to initiate DNA replication (a well-established role of TopBP1), but the number and concentration of TopBP1 molecules did not reach levels sufficient to activate the ATR/Chk1 pathway.

      (4) In Fig. 3A, despite the drastic change in the FACS plot shape, the quantifications appear quite similar. 

      Thank you for this insightful observation. The gates for the S phase were intentionally set wider to avoid biasing the results and inadvertently excluding the population that incorporates BrdU weakly (but still incorporates it) in the SN-38 only condition. As a result, the percentage of cells within this gate remains similar, even though the overall shape of the FACS plot changes, reflecting a shift in the distribution of BrdU incorporation. This point has now been clarified in the legend of the Figure 3A.

      This effect can also be attributed to the relatively short treatment time (2 hours), which captures early changes in DNA synthesis. The effect becomes more pronounced at later time points, as shown in Figure 3C. For example, after 6 hours of treatment, the percentage of BrdU-positive cells increases from 15% with SN-38 alone to 41% with the AZD2858 combination, demonstrating a clearer impact on DNA synthesis. A graph summarizing the statistical analysis has been added to Figure S5C for the 6-hour time point and Figure S5D for the 12-hour time point, based on data from three independent biological replicates.

      (5) The results section is imbalanced; Figs. 5 and 6 could be combined into one figure. 

      We have combined Figures 5 and 6 into a single figure to optimize the presentation of results. To avoid overloading the new figure, some of the data have been moved to supplementary figures, ensuring the main figure remains clear and focused.

      (6) An in vivo study is anticipated to assess the drug's efficacy. 

      Although AZD2858 was developed a few years ago, there is a limited amount of in vivo data available, which led us to consider potential issues related to the drug's biodistribution or its pharmacokinetics (PK). Despite these concerns, we proceeded with preliminary in vivo studies, testing various diluents and injection routes for AZD2858. However, we observed that the compound was not effective in vivo. Given the strong synergistic effects observed in vitro, we concluded that AZD2858 was likely not being distributed properly in the mice. As a result, we have decided to conduct a more detailed investigation into the pharmacokinetics (PK), pharmacodynamics (PD), and absorption, distribution, metabolism, and excretion (ADME) of AZD2858 to better understand its in vivo behavior and efficacy. Therefore, the in vivo evaluation of AZD2858 will be addressed in a separate study specifically focused on this aspect.

      Reviewer #2 (Significance (Required)): 

      Addressing the stated concerns would enhance clarity and further in vivo work may improve significance. 

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): 

      Summary 

      In 2021 (PMID: 33503405) and 2024 (PMID: 38578830) Constantinou and colleagues published two elegant papers in which they demonstrated that the Topbp1 checkpoint adaptor protein could assemble into mesoscale phase-separated condensates that were essential to amplify activation of the PIKK, ATR, and its downstream effector kinase, Chk1, during DNA damage signalling. A key tool that made these studies possible was the use of a chimeric Topbp1 protein bearing a cryptochrome domain, Cry2, which triggered condensation of the chimeric Topbp1 protein, and thus activation of ATR and Chk1, in response to irradiation with blue light without the myriad complications associated with actually exposing cells to DNA damage. 

      In this current report Morano and co-workers utilise the same optogenetic Topbp1 system to investigate a different question, namely whether Topbp1 phase-condensation can be inhibited pharmacologically to manipulate downstream ATR-Chk1 signalling. This is of interest, as the therapeutic potential of the ATR-Chk1 pathway is an area of active investigation, albeit generally using more conventional kinase inhibitor approaches. 

      The starting point is a high throughput screen of 4730 existing or candidate small molecule anticancer drugs for compounds capable of inhibiting the condensation of the Topbp1-Cry2mCherry reporter molecule in vivo. A surprisingly large number of putative hits (>300) were recorded, from which 131 of the most potent were selected for secondary screening using activation of Chk1 in response to DNA damage induced by SN-38, a topoisomerase inhibitor, as a surrogate marker for Topbp1 condensation. From this the 10 most potent compounds were tested for interactions with a clinically used combination of SN-38 and 5-FU (FOLFIRI) in terms of cytotoxicity in HCT116 cells. The compound that synergised most potently with FOLFIRI, the GSK3-beta inhibitor drug AZD2858, was selected for all subsequent experiments. 

      AZD2858 is shown to suppress the formation of Topbp1 (endogenous) condensates in cells exposed to SN-38, and to inhibit activation of Chk1 without interfering with activation of ATM or other endpoints of damage signalling such as formation of gamma-H2AX or activation of Chk2 (generally considered to be downstream of ATM). AZD2858 therefore seems to selectively inhibit the Topbp1-ATR-Chk1 pathway without interfering with parallel branches of the DNA damage signalling system, consistent with Topbp1 condensation being the primary target. Importantly, neither siRNA depletion of GSK3-beta, or other GSK3-beta inhibitors were able to recapitulate this effect, suggesting it was a specific non-canonical effect of AZD2858 and not a consequence of GSK3-beta inhibition per se. 

      To understand the basis for synergism between AZD2858 and SN-38 in terms of cell killing, the effect of AZD2858 on the replication checkpoint was assessed. This is a response, mediated via ATR-Chk1, that modulates replication origin firing and fork progression in S-phase cell under conditions of DNA damage or when replication is impeded. SN-38 treatment of HCT116 cells markedly suppresses DNA replication, however this was partially reversed by co-treatment with AZD2858, consistent with the failure to activate ATR-Chk1 conferring a defect in replication checkpoint function. 

      Figures 4 and 5 demonstrate that AZD2858 can markedly enhance the cytotoxic and cytostatic effects of SN-38 and FOLFIRI through a combination of increased apoptosis and growth arrest according to dosage and treatment conditions. Figure 6 extends this analysis to cells cultured as spheroids, sometimes considered to better represent tumor responses compared to single cell cultures. 

      Major comments 

      Most of the data presented is of good technical quality and supports the conclusions drawn. There are however a small number of instances where this is not true; ie where the data are of insufficient technical quality, or where the description or interpretation of the results is at variance with the data which is presented. Some examples: 

      (1) Fig.2E - the claim that "we observed an increase in RPA, Topb1 and Pol-epsilon levels when CPT and AZD2858 were added together" do not seem to be justified by the data provided. It is also unclear what the purpose/ significance of this experiment is. 

      Thank you for pointing out the contradiction in Figure 2E. Upon review, we identified an error in the labeling of conditions (CPT and AZD2858 were inadvertently swapped). The corrected figure now clearly shows that, at the 60-minute timepoint after starting replication, the combination of

      CPT and AZD2858 results in a greater accumulation of TopBP1, Pol ε, and RPA on chromatin compared to CPT alone. We have revised the sentence to: "Our data demonstrate that combining CPT and AZD2858 earlier enhances the accumulation of replication-related factors (RPA, TopBP1, and Pol ε) on chromatin compared to CPT treatment alone, particularly visible at the 60minute after starting replication."

      The significance of this experiment lies in its connection to the earlier observation that AZD2858 restores BrdU incorporation when combined with SN-38, as shown in flow cytometry data (Figure 3A). At a molecular level, this was further supported by DNA fiber assays, which revealed that replication tracks (CldU tracts) were longer in the combination treatment compared to SN-38 alone (Figure 3B).

      To strengthen and validate these findings, we chose to employ the Xenopus egg extract system for several reasons. This model provides a highly controlled environment where DNA replication occurs without confounding effects from transcription or translation. Moreover, replication is limited to a single round, offering a unique opportunity to specifically interrogate replication mechanisms. These attributes make the Xenopus model an ideal system to confirm that AZD2858 facilitates replication recovery in the presence of replication stress induced by agents like CPT. This will lead, in longer treatment, to accumulation of DNA damage and apoptosis (Figure 3D-E and Figure 4A-D)

      (2) Figs. 3 A and C certainly show that the SN-38-mediated suppression of DNA synthesis is modified and partially alleviated by co-treatment with AZD2858. The statement however that "prolonged co-incubation with AZD2858 for 6 and 12 hours effectively abolished the SN-38 induced S-phase checkpoint" is clearly misleading. If this were true, then the BrdU incorporation profiles of the respective samples would be similar or identical to control, which clearly they are not. Clearly AZD2858 is affecting the imposition of the S-phase checkpoint in some way, but not "abolishing" it. 

      We appreciate the reviewer’s detailed observations regarding Figures 3A and 3C and the phrasing in our manuscript. We agree that the term "abolished" is not precise in describing the effects of AZD2858 on the SN-38-induced S-phase checkpoint.

      To clarify: our data indicate that co-treatment with AZD2858 modifies and partially alleviates the SN-38-induced suppression of DNA synthesis, as demonstrated by increased BrdU incorporation relative to SN-38 treatment alone. However, as the reviewer correctly points out, the BrdU incorporation profiles of the co-treated samples do not fully return to control non treated cells levels. This suggests that while AZD2858 significantly mitigates the S-phase checkpoint, it does not completely abolish it.

      We have revised the statement in the manuscript to better reflect these findings, as follows: "Prolonged co-incubation with AZD2858 for 6 and 12 hours significantly alleviated the SN-38induced S-phase checkpoint, as evidenced by the partially increased BrdU incorporation. However, the population of co-treated cells is heterogeneous: some cells exhibit BrdU incorporation levels similar to those of untreated control cells, while others incorporate BrdU at levels comparable to cells treated with SN-38 alone. This indicates that AZD2858 does not fully restore DNA synthesis to control levels across the entire cell population."

      This revised phrasing aligns with the data presented and acknowledges the partial recovery of DNA synthesis observed. Thank you for bringing this to our attention and helping us improve the accuracy of our conclusions.

      (3) Fig. 3 E. The western blots of pDNA-PKcs (S2056) and total DNA-PKcs are really not interpretable. It is possible to sympathise that these reagents are probably extremely difficult to work with and obtain clear results, however uninterpretable results are not acceptable. 

      We agree that the data presented in the Fig3E are difficult to interpret. As noted by Reviewer 1, we recognize the challenge of obtaining clear and reliable results with these specific reagents. Based on this feedback, and to ensure the robustness of our conclusions, we have decided to exclude these specifics blots from the revised manuscript.

      We believe that this adjustment will enhance the clarity and reliability of the manuscript while focusing on the other, more interpretable data presented. Thank you for pointing this out, and we appreciate your understanding.

      (4) Fig. 3D. This is a puzzling image. Described as a PFGE assay, it presumably depicts an agarose gel, with intact genomic DNA at the top and a discrete band below representing fragmented genomic DNA. This is a little surprising, as fragmented genomic DNA does not usually appear as a specific band but as a heterogenous population or "smear". Nevertheless, even if one accepts this premise, it is unclear what is meant by "DSBs remained elevated after the combined treatment" when the intensity of this band is equivalent for both SN-38 and SN-38 + AZD2858 treatments. 

      We thank the reviewer for his insightful comments regarding the PFGE results in Figure 3D. We agree that the appearance of a discrete band, rather than a heterogeneous smear, is atypical for fragmented genomic DNA in this assay. However, by enhancing the signal intensity (as shown below), the expected smear becomes more appreciable.

      Author response image 4.

      Regarding the interpretation of the band intensities, we agree that the signals for SN-38 and SN38 + AZD2858 appear similar under these specific conditions. At the relatively high concentration of SN-38 used in this experiment (300 nM), it is indeed challenging to observe a more pronounced effect on DNA breaks. This is why we proposed the "DSBs remained elevated after the combined treatment" because the band intensity of SN-38 single agent treated cells or combined with AZD2858 is comparable. However, we note a slightly more intense γH2AX signal over time when AZD2858 is combined with SN-38 compared to SN-38 alone (Figure 3E). Furthermore, under lower, sub-optimal doses of SN-38 and over extended incubation treatment (48h), we observe a clearer increase in fragmented DNA bands, as demonstrated in Figure 4D.

      Minor comments 

      (1) Fig. 1. A surprisingly large number of compounds scored positive in the primary screen for inhibition of Topbp1 condensation (>300). Of the 131 of these selected for secondary screening using Chk1 activation (S345 phosphorylation) as a readout approximately 2/3 were negative, implying that a majority of the tested compounds inhibited Topbp1 condensation but not Chk1 activation. What could explain that?

      Thank you for this thoughtful comment. The discrepancy between the large number of compounds scoring positive for TopBP1 condensation inhibition and the smaller number inhibiting Chk1 activation (S345 phosphorylation) could be attributed to several factors:

      • Different cell lines and induction methods: The initial screen was conducted in HEK293 TrexFlpin cells overexpressing optoTopBP1, while the secondary screen used HCT116 cells. In addition, the methods used to induce the respective pathways were distinct: in the primary screen, we employed a blue light induction of opto-TopBP1 condensates, whereas in the secondary screen, we used an SN-38 treatment to induce DNA replication stress and activate the Chk1 pathway. These differences could account for the varying responses observed in the two screens.

      • The compounds that inhibited TopBP1 condensation might not fully block Chk1 activation. While they disrupt TopBP1 condensation, they may still allow for partial activation of Chk1 or Chk1 activation through alternative mechanisms. For instance, Chk1 activation could be mediated by other signaling pathways or molecules, such as ETAA1, a known Chk1 activator (1). Thus, TopBP1 condensation inhibition does not necessarily translate to complete inhibition of Chk1 activation, especially if ETAA1 is employed by cells as a rescue activator.

      • Some compounds may affect chromosome dynamics, potentially generating mechanical forces or torsional stress that could activate the ATR/Chk1 pathway independently of TopBP1

      (2).

      These factors suggest that while the compounds effectively disrupt TopBP1 condensation, they may not always fully inhibit the downstream Chk1 activation, pointing to the complexity of the DNA damage response pathways. 

      (1) Bass, T. E. et al. ETAA1 acts at stalled replication forks to maintain genome integrity. Nat Cell Biol 18, 1185–1195 (2016).

      (2) Kumar, A. et al. ATR Mediates a Checkpoint at the Nuclear Envelope in Response to Mechanical Stress. Cell 158, 633–646 (2014).

      (2) Fig. 2D. The protein-protein interaction assay shown demonstrates that AZD2858 ablates the light-induced auto-interaction between exogenous opto-Topbp1 molecules and ATR plus or minus SN-38, but clearly endogenous Topbp1 molecules do not participate. Why is this? 

      The biotin proximity labeling assay was conducted without exposing cells to light, using a TurboID module fused to TopBP1-mCherry-CRY2. Stable cell lines were then generated in HEK293 TrexFlpIn cells, where endogenous TopBP1 is still expressed. Upon adding doxycycline, the recombinant TurboID-TopBP1-mCherry-Cry2 (opto-TopBP1) is induced at levels comparable to endogenous TopBP1 (Fig 2D).

      Since the opto-TopBP1 construct exhibits behavior similar to that of endogenous TopBP1 (1), we used it to investigate whether TopBP1 self-assembly and its interaction with ATR are influenced by AZD2858 alone or in combination with SN38. Our results show that treatment with SN38 increases the proximity between opto-TopBP1 and the endogenous TopBP1 (not fused to TurboID). However, AZD2858, either alone or in combination with SN38, disrupts the selfassembly of recombinant TopBP1 with itself as well as its interaction with endogenous TopBP1.

      (1) Frattini C, Promonet A, Alghoul E, Vidal-Eychenie S, Lamarque M, Blanchard MP, et al. TopBP1 assembles nuclear condensates to switch on ATR signaling. Molecular Cell. 18 mars 2021;81(6):1231-1245.e8.

      Reviewer #3 (Significance (Required)): 

      Significance 

      Liquid phase separation of protein complexes is increasingly recognised as a fundamental mechanism in signal transduction and other cellular processes. One recent and important example was that of Topbp1, whose condensation in response to DNA damage is required for efficient activation of the ATR-Chk1 pathway. The current study asks a related but distinct question; can protein condensation be targeted by drugs to manipulate signalling pathways which in the main rely on protein kinase cascades? 

      Here, the authors identify an inhibitor of GSK3-beta as a novel inhibitor of DNA damage-induced Topbp1 condensation and thus of ATR-Chk1 signalling. 

      This work will be of interest to researchers in the fields of DNA damage signalling, biophysics of protein condensation, and cancer chemotherapy.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper by Brickwedde et al., the authors observe an increase in posterior alpha when anticipating auditory as opposed to visual targets. The authors also observe an enhancement in both visual and auditory steady-state sensory evoked potentials in anticipation of auditory targets, in correlation with enhanced occipital alpha. The authors conclude that alpha does not reflect inhibition of early sensory processing, but rather orchestrates signal transmission to later stages of the sensory processing stream. However, there are several major concerns that need to be addressed in order to draw this conclusion.

      First, I am not convinced that the frequency tagging method and the associated analyses are adequate for dissociating visual vs auditory steady-state sensory evoked potentials.

      Second, if the authors want to propose a general revision for the function of alpha, it would be important to show that alpha effects in the visual cortex for visual perception are analogous to alpha effects in the auditory cortex for auditory perception.

      Third, the authors propose an alternative function for alpha - that alpha orchestrates signal transmission to later stages of the sensory processing stream. However, the supporting evidence for this alternative function is lacking. I will elaborate on these major concerns below.

      (1) Potential bleed-over across frequencies in the spectral domain is a major concern for all of the results in this paper. The fact that alpha power, 36Hz and 40Hz frequency-tagged amplitude and 4Hz intermodulation frequency power is generally correlated with one another amplifies this concern. The authors are attaching specific meaning to each of these frequencies, but perhaps there is simply a broadband increase in neural activity when anticipating an auditory target compared to a visual target?

      We appreciate the reviewer’s insightful comment regarding the potential bleed-over across frequencies in the spectral domain. We fully acknowledge that the trade-off between temporal and frequency resolution is a challenge, particularly given the proximity of the frequencies we are examining.

      To address this concern, we performed additional analyses to investigate whether there is indeed a broadband increase in neural activity when anticipating an auditory target as compared to a visual target, as opposed to distinct frequency-specific effects. Our results show that the bleed-over between frequencies is minimal and does not significantly affect our findings. Specifically, we repeated the analyses using the same filter and processing steps for the 44 Hz frequency. At this frequency, we did not observe any significant differences between conditions.

      These findings suggest that the effects we report are indeed specific to the 40 Hz frequency band and not due to a general broadband increase in neural activity. We hope this addresses the reviewer’s concern and strengthens the validity of our frequency-specific results.

      Author response image 1.

      Illustration of bleeding over effects over a span of 4 Hz. A, 40 Hz frequency-tagging data over the significant cluster differing between when expecting an auditory versus a visual target (identical to Fig. 9 in the manuscript). B, 44 Hz signal over the same cluster chosen for A. The analysis was identical with the analysis performed in  A, apart from the frequency for the band-pass filter.

      We do, however, not specifically argue against the possibility of a broadband increase when anticipating an auditory compared to a visual target. But even a broadband-increase would directly contradict the alpha inhibition hypothesis, which poses that an increase in alpha completely disengages the whole cortex. We will clarify this point in the revised manuscript.

      (2) Moreover, 36Hz visual and 40Hz auditory signals are expected to be filtered in the neocortex. Applying standard filters and Hilbert transform to estimate sensory evoked potentials appears to rely on huge assumptions that are not fully substantiated in this paper. In Figure 4, 36Hz "visual" and 40Hz "auditory" signals seem largely indistinguishable from one another, suggesting that the analysis failed to fully demix these signals.

      We appreciate the reviewer’s insightful concern regarding the filtering and demixing of the 36 Hz visual and 40 Hz auditory signals, and we share the same reservations about the reliance on standard filters and the Hilbert transform method.

      To address this, we would like to draw attention to Author response image 1, which demonstrates that a 4 Hz difference is sufficient to effectively demix the signals using our chosen filtering and Hilbert transform approach. We believe that the reason the 36 Hz visual and 40 Hz auditory signals show similar topographies lies not in incomplete demixing but rather in the possibility that this condition difference reflects sensory integration, rather than signal contamination.

      This interpretation is further supported by our findings with the intermodulation frequency at 4 Hz, which also suggests cross-modal integration. Furthermore, source localization analysis revealed that the strongest condition differences were observed in the precuneus, an area frequently associated with sensory integration processes. We will expand on this in the discussion section to better clarify this point.

      (3) The asymmetric results in the visual and auditory modalities preclude a modality-general conclusion about the function of alpha. However, much of the language seems to generalize across sensory modalities (e.g., use of the term 'sensory' rather than 'visual').

      We thank the reviewer for pointing this out and agree that in some cases we have not made a good enough distinction between visual and sensory. We will make sure, that when using ‘sensory’, we either describe overall theories, which are not visual-exclusive or refer to the possibility of a broad sensory increase. However, when directly discussing our results and the interpretation thereof, we will now use ‘visual’ in the revised manuscript.

      (4) In this vein, some of the conclusions would be far more convincing if there was at least a trend towards symmetry in source-localized analyses of MEG signals. For example, how does alpha power in the primary auditory cortex (A1) compare when anticipating auditory vs visual target? What do the frequency-tagged visual and auditory responses look like when just looking at the primary visual cortex (V1) or A1?

      We thank the reviewer for this important suggestion and have added a virtual channel analysis. We were however, not interested in alpha power in primary auditory cortex, as we were specifically interested in the posterior alpha, which is usually increased when expecting an auditory compared to a visual target (and used to be interpreted as a blanket inhibition of the visual cortex). We will improve upon the clarity concerning this point in the manuscript.

      We have however, followed the reviewer’s suggestion of a virtual channel analysis, showing that the condition differences are not observable in primary visual cortex for the 36 Hz visual signal and in primary auditory cortex for the 40 Hz auditory signal. Our data clearly shows that there is an alpha condition difference in V1, while there no condition difference for 36 Hz in V1 and for 40 Hz in Heschl’s Gyrus (see Author response image 2).

      Author response image 2.

      Virtual channels for V1 and Helschl’s gyrus. A, alpha power for the virtual channel created in V1 (Calcerine_L and Calcerine_R from AAL atlas; Tzourio-Mazoyer et al., 2002, NeuroImage). A cluster permutation analysis over time (between -2 and 0) revealed a significant condition difference between ~ -2 and -1.7 s (p = 0.0449). B, 36 Hz frequency-tagging signal for the virtual channel created in V1 (equivalent to the procedure in A). The same cluster permutation as performed in A revealed no significant condition differences. C, 40 Hz frequency-tagging signal for the virtual channel created in Heschl’s gryrus (Heschl_L and Heschl_R from AAL atlas; Tzourio-Mazoyer et al., 2002, NeuroImage). The same cluster permutation as performed in A revealed no significant condition differences.

      (5) Blinking would have a huge impact on the subject's ability to ignore the visual distractor. The best thing to do would be to exclude from analysis all trials where the subjects blinked during the cue-to-target interval. The authors mention that in the MEG experiment, "To remove blinks, trials with very large eye-movements (> 10 degrees of visual angle) were removed from the data (See supplement Fig. 5)." This sentence needs to be clarified since eye-movements cannot be measured during blinking. In addition, it seems possible to remove putative blink trials from EEG experiments as well, since blinks can be detected in the EEG signals.

      We thank the reviewer for mentioning that we were making this point confusing. From the MEG-data, we removed eyeblinks using ICA. Alone for the supplementary Fig. 5 analysis, we used the eye-tracking data to confirm that participants were in fact fixating the centre of the screen. For this analysis, we removed trials with blinks (which can be seen in the eye-tracker as huge amplitude movements or as large eye-movements in degrees of visual angle; see Author response image 3 below to show a blink in the MEG data and the according eye-tracker data in degrees of visual angle). We will clarify this in the methods section.

      As for the concern closed eyes to ignore visual distractors, in both experiments we can observe highly significant distractor cost in accuracy for visual distractors, which we hope will convince the reviewer that our visual distractors were working as intended.

      Author response image 3.

      Illustration of eye-tracker data for a trial without and a trial with a blink. All data points recorded during this trial are plottet. A, ICA component 1, which reflects blinks and its according data trace in a trial. No blink is visible. B, eye-tracker data transformed into degrees of visual angle for the trial depicted in A. C, ICA component 1, which reflects blinks and its according data trace in a trial. A clear blink is visible. D, eye-tracker data transformed into degrees of visual angle for the trial depicted in C.

      (6) It would be interesting to examine the neutral cue trials in this task. For example, comparing auditory vs visual vs neutral cue conditions would be indicative of whether alpha was actively recruited or actively suppressed. In addition, comparing spectral activity during cue-to-target period on neutral-cue auditory correct vs incorrect trials should mimic the comparison of auditory-cue vs visual-cue trials. Likewise, neutral-cue visual correct vs incorrect trials should mimic the attention-related differences in visual-cue vs auditory-cue trials.

      We thank the reviewer for this suggestion. We have analysed the neutral cue trials in the EEG dataset (see suppl. Fig. 1) and will expand this figure to show all conditions. There were no significant differences to auditory or visual cues, but descriptively alpha power was higher for neutral cues compared to visual cues and lower for neutral cues compared to auditory cues. While this may suggest that for visual trials alpha is actively suppressed and for auditory trials actively recruited, we do not feel comfortable to make this claim, as the neutral condition may not reflect a completely neutral state. The neutral task can still be difficult, especially because of the uncertainty of the target modality.

      As for the analysis of incorrect versus correct trials, we love the idea, but unfortunately the accuracy rate was quite high so that the number of incorrect trials would not be sufficient to perform a reliable analysis.

      (7) In the abstract, the authors state that "This implies that alpha modulation does not solely regulate 'gain control' in early sensory areas but rather orchestrates signal transmission to later stages of the processing stream." However, I don't see any supporting evidence for the latter claim, that alpha orchestrates signal transmission to later stages of the processing stream. If the authors are claiming an alternative function to alpha, this claim should be strongly substantiated.

      We thank the reviewer for pointing out, that we have not sufficiently explained our case. The first point refers to gain control akin to the alpha inhibition hypothesis, which claims that increases in alpha disengage a whole cortical area. Since we have confirmed the alpha increase in our data to originate from primary visual cortex through source analysis, this should lead to decreased visual processing. The increase in 36 Hz visual processing therefore directly contradicts the alpha inhibition hypothesis. We propose an alternative explanation for the functionality of alpha activity in this task. Through pulsed inhibition, information packages of relevant visual information could be transmitted down the processing stream, thereby enhancing relevant visual signal transmission. We believe the fact that the enhanced visual 36 Hz signal we found correlated with visual alpha power on a trial-by-trial basis, and did not originate from primary visual cortex, but from areas known for sensory integration supports our claim.

      We will make this point clearer in our revised manuscript.

      Reviewer #2 (Public review):

      Brickwedde et al. investigate the role of alpha oscillations in allocating intermodal attention. A first EEG study is followed up with a MEG study that largely replicates the pattern of results (with small to be expected differences). They conclude that a brief increase in the amplitude of auditory and visual stimulus-driven continuous (steady-state) brain responses prior to the presentation of an auditory - but not visual - target speaks to the modulating role of alpha that leads them to revise a prevalent model of gating-by-inhibition.

      Overall, this is an interesting study on a timely question, conducted with methods and analysis that are state-of-the-art. I am particularly impressed by the author's decision to replicate the earlier EEG experiment in MEG following the reviewer's comments on the original submission. Evidently, great care was taken to accommodate the reviewer's suggestions.

      We thank the reviewer for the positive feedback and expression of interest in the topic of our manuscript.

      Nevertheless, I am struggling with the report for two main reasons: It is difficult to follow the rationale of the study, due to structural issues with the narrative and missing information or justifications for design and analysis decisions, and I am not convinced that the evidence is strong, or even relevant enough for revising the mentioned alpha inhibition theory. Both points are detailed further below.

      We thank the reviewer for raising this important point. We will revise our introduction and results in line with the reviewer’s suggestions, hoping that our rationale will then be easier to follow and that our evidence will be more convincing.

      Strength/relevance of evidence for model revision: The main argument rests on 1) a rather sustained alpha effect following the modality cue, 2) a rather transient effect on steady-state responses just before the expected presentation of a stimulus, and 3) a correlation between those two. Wouldn't the authors expect a sustained effect on sensory processing, as measured by steady-state amplitude irrespective of which of the scenarios described in Figure 1A (original vs revised alpha inhibition theory) applies? Also, doesn't this speak to the role of expectation effects due to consistent stimulus timing? An alternative explanation for the results may look like this: Modality-general increased steady-state responses prior to the expected audio stimulus onset are due to increased attention/vigilance. This effect may be exclusive (or more pronounced) in the attend-audio condition due to higher precision in temporal processing in the auditory sense or, vice versa, too smeared in time due to the inferior temporal resolution of visual processing for the attend-vision condition to be picked up consistently. As expectation effects will build up over the course of the experiment, i.e., while the participant is learning about the consistent stimulus timing, the correlation with alpha power may then be explained by a similar but potentially unrelated increase in alpha power over time.

      We thank the reviewer for raising these insightful questions and suggestions.

      It is true that our argument rests on a rather sustained alpha effect and a rather transient effect on steady-state responses and a correlation between the two. However, this connection would not be expected under the alpha inhibition hypothesis, which states that alpha activity would inhibit a whole cortical area (when irrelevant to the task), exerting “gain control”. This notion directly contradicts our results of the “irrelevant” visual information a) being transmitted at all and b) increasing.

      However, it has been shown on many occasions that alpha activity exerts pulsed inhibition, so we proposed an alternative theory of an involvement in signal transmission. In this case, the cyclic inhibition would serve as an ordering system, which only allows for high-priority information to pass, resulting in higher signa-to-noise. We do not make a claim about how fast or when these signals are transmitted in relation to alpha power. For instance, it could be that alpha power increases as a preparatory state even before signal is actually transmitted.  Zhigalov (2020 Hum. Brain M.) has shown that in V1, frequency-tagging responses were up-and down regulated with attention – independent of alpha activity.

      But we do believe that the fact that visual alpha power correlates on a trial-by-trial level with visual 36 Hz frequency-tagging increases and (a relationship which has not been found in V1, see Zhigalov 2020, Hum. Brain Mapp.) suggest a strong connection. Furthermore, the fact that the alpha modulation originates from early visual areas and occurs prior to any frequency-tagging changes, while the increase in frequency-tagging can be observed in areas which are later in the processing stream (such as the precuneus) is strongly indicative for an involvement of alpha power in the transmission of this signal. We cannot fully exclude alternative accounts and mechanisms which effect both alpha power and frequency-tagging responses. 

      We do believe that the alternative account described by the reviewer does not contradict our theory, as we do believe that the alpha power modulation may reflect an expectation effect (and the idea that it could be related to the resolution of auditory versus visual processing is very interesting!). It is also possible that this expectation is, as the reviewer suggests, related to attention/vigilance and might result in a modality-general signal increase. And indeed, we can observe an increase in the frequency-tagging response in sensory integration areas. Accordingly, we believe that the alternative explanation provided by the reviewer contradicts the alpha inhibition hypothesis, but not necessarily our alternative theory.

      We will revise the discussion, which we hope will make our case stronger and easier to follow. Additionally, we will mention the possibility for alternative explanations as well as the possibility, that alpha networks fulfil different roles in different locations/task environments.

      Structural issues with the narrative and missing information: Here, I am mostly concerned with how this makes the research difficult to access for the reader. I list the major points below:

      In the introduction the authors pit the original idea about alpha's role in gating against some recent contradictory results. If it's the aim of the study to provide evidence for either/or, predictions for the results from each perspective are missing. Also, it remains unclear how this relates to the distinction between original vs revised alpha inhibition theory (Fig. 1A). Relatedly if this revision is an outcome rather than a postulation for this study, it shouldn't be featured in the first figure.

      We agree with the reviewer that we have not sufficiently clarified our goal as well as how different functionalities of alpha oscillations would lead to different outcomes. We will revise the introduction and restructure the results and hope that it will be easier to follow.

      The analysis of the intermodulation frequency makes a surprise entrance at the end of the Results section without an introduction as to its relevance for the study. This is provided only in the discussion, but with reference to multisensory integration, whereas the main focus of the study is focussed attention on one sense. (Relatedly, the reference to "theta oscillations" in this sections seems unclear without a reference to the overlapping frequency range, and potentially more explanation.) Overall, if there's no immediate relevance to this analysis, I would suggest removing it.

      We thank the reviewer for pointing this out and will add information about this frequency to the introduction part. We believe that the intermodulation frequency analysis is important, as it potentially supports the notion that condition differences in the visual-frequency tagging response are related to downstream processing rather than overall visual information processing in V1. We would therefore prefer to leave this analysis in the manuscript.

      Reviewer #3 (Public review):

      Brickwedde et al. attempt to clarify the role of alpha in sensory gain modulation by exploring the relationship between attention-related changes in alpha and attention-related changes in sensory-evoked responses, which surprisingly few studies have examined given the prevalence of the alpha inhibition hypothesis. The authors use robust methods and provide novel evidence that alpha likely exhibits inhibitory control over later processing, as opposed to early sensory processing, by providing source-localization data in a cross-modal attention task.

      This paper seems very strong, particularly given that the follow-up MEG study both (a) clarifies the task design and separates the effect of distractor stimuli into other experimental blocks, and (b) provides source-localization data to more concretely address whether alpha inhibition is occurring at or after the level of sensory processing, and (c) replicates most of the EEG study's key findings.

      We are very grateful to the reviewer for their positive feedback and evaluation of our work.

      There are some points that would be helpful to address to bolster the paper. First, the introduction would benefit from a somewhat deeper review of the literature, not just reviewing when the effects of alpha seem to occur, but also addressing how the effect can change depending on task and stimulus design (see review by Morrow, Elias & Samaha (2023).

      We thank the reviewer for this suggestion and agree. We will add a paragraph to the introduction which refers to missing correlation studies and the impact of task design.

      Additionally, the discussion could benefit from more cautionary language around the revision of the alpha inhibition account. For example, it would be helpful to address some of the possible discrepancies between alpha and SSEP measures in terms of temporal specificity, SNR, etc. (see Peylo, Hilla, & Sauseng, 2021). The authors do a good job speculating as to why they found differing results from previous cross-modal attention studies, but I'm also curious whether the authors think that alpha inhibition/modulation of sensory signals would have been different had the distractors been within the same modality or whether the cues indicated target location, rather than just modality, as has been the case in so much prior work?

      We thank the reviewer for suggesting these interesting discussion points and will include a paragraph in our discussion which goes deeper into these topics.

      Overall, the analyses and discussion are quite comprehensive, and I believe this paper to be an excellent contribution to the alpha-inhibition literature.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript, "A versatile high-throughput assay based on 3D ring-shaped cardiac tissues generated from human induced pluripotent stem cell-derived cardiomyocytes" developed a unique culture platform with PEG hydrogel that facilitates the in-situ measurement of contractile dynamics of the engineered cardiac rings. The authors optimized the tissue seeding conditions, demonstrated tissue morphology with expressions of cardiac and fibroblast markers, mathematically modeled the equation to derive contractile forces and other parameters based on imaging analysis, and ended by testing several compounds with known cardiac responses.

      To strengthen the paper, the following comments should be considered:

      1) This paper provided an intriguing platform that creates miniature cardiac rings with merely thousands of CMs per tissue in a 96-well plate format. The shape of the ring and the squeezing motion can recapitulate the contraction of the cardiac chamber to a certain degree. However, Thavandiran et al (PNAS 2013) created a larger version of the cardiac ring and found the electrical propagation revealed spontaneous infinite loop-like cycles of activation propagation traversing the ring. This model was used to mimic a reentrant wave during arrhythmia. Therefore, it presents great concerns if a large number of cardiac tissues experience arrhythmia by geometry-induced re-entry current and cannot be used as a healthy tissue model. It would be interesting to see the impulse propagation/calcium transient on these miniature cardiac rings and evaluate the % of arrhythmia occurrence.

      The size is a key factor impacting the electrical propagation within the generated tissues. Our ring-shaped cardiac tissues have a diameter of 360µm, which is largely smaller than other tissues proposed so far, including in Thavandiran et al (PNAS 2013) where circular tissues had a reported size > 1mm. As shown in Figure 4E (and highlighted below in Author response image 1), tissues under basal conditions display regular beating rates without spontaneous arrhythmias. Videos also show that the tissue contraction is homogeneous around the pillar, suggesting that the smaller size favors the electrical propagation and limits the occurrence of spontaneous reentrant waves. Optical mapping measurements will be performed in the future to assess the occurrence of reentrant waves.

      **Author response image 1. **

      Poincaré plot showing the plots between successive RR intervals (Data from Figure 4E in basal conditions). Linear regression with 95% confidence interval indicates identity.

      2) The platform can produce 21 cardiac rings per well in 96-well plates. The throughput has been the highest among competing platforms. The resulting tissues have good sarcomere striation due to the strain from the pillars. Now the emerging questions are culture longevity and reproducibility among tissues. According to Figure 1E, there was uneven ring formation around the pillar, which leads to the tissue thinning and breaking off. There is only 50% survival after 20 days of culture in the optimized seeding group. Is there any way to improve it? The tissues had two compartments, cardiac and fibroblast-rich regions, where fibroblasts are responsible for maintaining the attachment to the glass slides. Do the cardiac rings detach from the glass slides and roll up? The SD of the force measurement is a quarter of the value, which is not ideal with such a high replicate number. As the platform utilizes imaging analysis to derive contractile dynamics, calibration should be done based on the angle and the distance of the camera lens to the individual tissues to reduce the error. On the other hand, how reproducible of the pillars? It is highly recommended to mechanically evaluate the consistency of the hydrogel-based pillars across different wells and within the wells to understand the variance. Figure 2B reports the early results obtained as the system was tested and developed. Since then, we have tested different iPSC lines and confirm that the overall yield is higher (up to 20 tissues at D14 for some cell lines), however dependent of cell lines.

      The tissues do not detach from the glass slides. It is very rare to see tissues roll up on the central pillar. As shown in Figure 1B, the pillars have a specific shape to avoid tissues to roll up as they develop and contract.

      3) Does the platform allow the observation of non-synchronized beating when testing with compounds? This can be extremely important as the intended applications of this platform are drug testing and cardiac disease modeling. The author should elaborate on the method in the manuscript and explain the obtained results in detail. The arrhythmogenic effect of a drug can be derived from the regularity of the beat-to-beat time. Indeed, we show that dofetilide increases the variability in the beat-to-beat time by plotting for each beat, the beat-to-beat time with the next beat as a function of the beat-to-beat time with the previous beat.

      4) The results of drug testing are interesting. Isoproterenol is typically causing positive chronotropic and positive inotropic responses, where inotropic responses are difficult to obtain due to low tissue maturity. It is inconsistent with other reported results that cardiac rings do not exhibit increased beating frequency, but slightly increased forces only. Zhao et al were using electrical pacing at a defined rate during force measurement, whereas the ring constructs are not.

      We agree. The difference in the response to isoproterenol with previous papers may be explained by different incubation timing with the drug. In our case, the tissues were incubated for 5 minutes at 37•C before being recorded.

      Overall, the manuscript is well written and the designed platform presented the unique advantages of high throughput cardiac tissue culture. Besides the contractile dynamics and IHC images, the paper lacks other cardiac functional evaluations, such as calcium handling, impulse propagation, and/or electrophysiology. The culture reproducibility (high SD) and longevity (<20 days) still remain unsolved.

      Since the submission, we have managed to keep some tissues and analyze them up to 32 days. At that time point the tissues are still beating. Nevertheless, a specific study concerning tissue longevity has not been carried out as the tissues were usually fixed after 14 days to be stained and analyze their structure.

      Reviewer #2 (Public Review):

      The authors should be commended for developing a high throughput platform for the formation and study of human cardiac tissues, and for discussing its potential, advantages and limitations. The study is addressing some of the key needs in the use of engineered cardiac tissues for pharmacological studies: ease of use, reproducible preparation of tissues, and high throughput.

      There are also some areas where the manuscript should be improved. The design of the platform and the experimental design should be described in more detail.

      It would be of interest to comprehensively document the progression of tissue formation. To this end, it would be helpful to show the changes in tissue structure through a series of images that would correspond to the progression of contractile properties shown in Figure 3.

      Our results indicate that the fibroblasts/cardiomyocytes segregation likely happens as soon as the tissue is formed, as the fibroblasts are critical for tissue generation. The change with time in the shape of the contractile ring is reported in Figure 1E, with a series of images which correspond to the timepoints of Figure 3.

      The very interesting tissue morphology (separation into the two regions) that was observed in this study is inviting more discussion.

      Finally, the reader would benefit from more specific comparisons of the contractile function of cardiac tissues measured in this study with data reported for other cardiac tissue models.

    1. Author Response

      We thank the reviewers for truly valuable advice and comments. We have made multiple corrections and revisions to the original pre-print accordingly. Here we address 2 major points.

      1) Regarding the genetic association of the common COL11A1 variant rs3753841 (p.(Pro1335Leu)), we do not propose that it is the sole risk variant contributing to the association signal we detected and have clarified this in the manuscript. We concluded that it was worthy of functional testing for reasons described here. Although there were several common variants in the discovery GWAS within and around COL11A1, none were significantly associated with AIS and none were in linkage disequilibrium (R2>0.6) with the top SNP rs3753841. We next reviewed rare (MAF<=0.01) coding variants within the COL11A1 LD region of the associated SNP (rs3753841) in 625 available exomes representing 46% of the 1,358 cases from the discovery cohort. The LD block was defined using Haploview based on the 1KG_CEU population. Within the ~41 KB LD region (chr1:103365089- 103406616, GRCh37) we found three rare missense mutations in 6 unrelated individuals, Author response table 1. Two of them (NM_080629.2: c.G4093A:p.A1365T; NM_080629.2:c.G3394A:p.G1132S), from two individuals, are predicted to be deleterious based on CADD and GERP scores and are plausible AIS risk candidates. At this rate we could expect to find only 4-5 individuals with linked rare coding variants in the total cohort of 1,358 which collectively are unlikely to explain the overall association signal we detected. Of course, there also could be deep intronic variants contributing to the association that we would not detect by our methods. However, given this scenario, the relatively high predicted deleteriousness of rs3753841 (CADD= 25.7; GERP=5.75), and its occurrence in a Gly-X-Y triplet repeat, we hypothesized that this variant itself could be a risk allele worthy of further investigation.

      Author response table 1.

      We also appreciate the reviewer’s suggestion to perform a rare variant burden analysis of COL11A1. We conducted pilot gene-based analysis in 4534 European ancestry exomes including 797 of our own AIS cases and 3737 controls and tested the burden of rare variants in COL11A1. SKATO P value was not significant (COL11A1_P=0.18) but this could due to lack of power and/or background from rare benign variants that could be screened out using the functional testing we have developed.

      2) Regarding functional testing, by knockdown/knockout cell culture experiments, we showed for the first time that Col11a1 negatively regulates Mmp3 expression in cartilage chondrocytes, an AIS-relevant tissue. We then tested the effect of overexpressing the human wt or variant COL11A1 by lentiviral transduction in SV40-transformed chondrocyte cultures. We deleted endogenous mouse Col11a1 by Cre recombination to remove the background of its strong suppressive effects on Mmp3 expression. We acknowledge that Col11a1 missense mutations could confer gain of function or dominant negative effects that would not be revealed in this assay. However as indicated in our original manuscript we have noted that spinal deformity is described in the cho/cho mouse, a Col11a1 loss of function mutant. We also note the recent publication by Rebello et al. showing that missense mutations in Col11a2 associated with congenital scoliosis fail to rescue a vertebral malformation phenotype in a zebrafish col11a2 KO line. Although the connection between AIS and vertebral malformations is not altogether clear, we surmise that loss of the components of collagen type XI disrupt spinal development. in vivo experiments in vertebrate model systems are needed to fully establish the consequences and genetic mechanisms by which COL11A1 variants contribute to an AIS phenotype.

    1. Reviewer #3 (Public review):

      Summary:

      The authors examine the role of the medial frontal cortex of mice in exploiting statistical structure in tasks. They claim that mice are "proactive": they predict upcoming changes, rather than responding in a "model-free" way to environmental changes. Further, they speculate that the estimation of future change (i.e., prediction of upcoming events, based on learning temporal regularities) might be "a main ... function of dorsal medial frontal cortex (dmFC)." Unfortunately, the current manuscript contains flaws such that the evidence supporting these claims is inadequate.

      Strengths:

      Understanding the neural mechanisms by which we learn about statistical structure in the world is an important goal. The authors developed an interesting task and used model-based techniques to try to understand the mechanisms by which perturbation of dmFC influenced behavior. They demonstrate that lesions and optogenetic silencing of dmFC influence behavior, showing that this region has a causal influence on the task.

      Weaknesses:

      I was concerned that the main behavioral effects shown in Figure 1F were a statistical artifact. By requiring the Geometric block length to be preceded by a performance-based block, the authors introduce a dependence that can generate the phenomena they describe as anticipation.

      To demonstrate this, I simulated their task with an agent that does not have any anticipation of the change point (Reviewer image 1). The agent repeats the previous action with probability `p(repeat)` (similar to the choice kernel in the author's models). If the agent doesn't repeat then the next choice depends on the previous outcome. If the previous choice was rewarded, it stays with `P(WS)` and chooses randomly with `1-P(WS)`. If the previous choice was unrewarded, it switches with `P(LS)` and chooses randomly with `1-P(LS)`.

      Review image 1.

      An agent with `P(WS)=P(LS)=P(repeat)=0.85` shows the same phenomena as the mice: a difference in performance before the block switch and "earlier" crossing of the midpoint after the switch. https://imgdrop.io/image/aHn6y. The phenomena go away in the simulations when a fixed block length of 20 trials is followed by a Geometric block length.

      The authors did not completely rely on the phenomena of Figure 1F for their conclusions. They did a model comparison to provide evidence that animals are anticipating the switch. Unfortunately, the authors did not use state-of-the-art methods in this section of the paper. In particular, they failed to show that under a range of generative parameters for each model class, the model selection process chooses the correct model class (i.e. a confusion matrix). A more minor point, they used BIC instead of a more robust cross-validated metric for model selection. Finally, instead of comparing their "best" anticipating model to their 2nd best model (without anticipation), they compared their best to their 4th best (Supp Fig 3.5). This seems misleading.

      Given all of the the above issues, it is hard to critically evaluate the model-based analysis of the effects of lesions/optogenetics.

    1. Author response:

      We thank the reviewers for their thoughtful criticisms.  This provisional response addresses what we consider the central critiques, with a full, point-by-point reply to follow with the revised manuscript.  Central critiques concern 1) providing further clarity about the apportionment cost of time, 2) generality & scope, and 3) clarifying the meaning of key equations.

      (1) Apportionment cost

      Reviewers commonly identified a need to provide a concise and intuitive definition of apportionment cost, and to explicitly solve and provide for its mathematical expression. 

      We will add the following definition of apportionment cost to the manuscript: “Apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration.”  While this difference is the apportionment cost of time, the amount that would be expected over a time equal to the considered pursuit under a policy of not taking the considered pursuit is the opportunity cost of time.  Together, they sum to Time’s Cost.  The above definition of apportionment cost adds to other stated relationships of apportionment cost found throughout the paper (Lines 434,435,447,450). 

      As suggested, we will also add equations of apportionment cost, as below.

      (2) Generality & Scope

      Generality. We will add further examples in support of the generality of these equations for assessing and thinking about the value of initiating a pursuit.  Specifically, this will include 1) illustrating forgo decision making in a world composed of multiple pursuits, as in prey selection, 2) demonstrating and examining worlds in which a sequence of pursuits compose a considered pursuit’s ‘outside’, and 3) clarifying how our framework does contend with variance and uncertainty in reward magnitude and occurrence.

      Scope. In this manuscript, we consider the worth of initiating one or another pursuit having completed a prior one, and not the issue of continuing within a pursuit having already engaged in it.  The worth of continuing a pursuit, as in patch-foraging/give-up tasks, constitutes a third fundamental time decision-making topology which is outside the scope of the current work.  It engages a large and important literature, encompassing evidence accumulation, and requires a paper in its own right that can use the concepts and framework developed here.  We will further consider applying this framework to extant datasets.

      (3) Correction of typographical errors and further explanation of equations.   

      We would like to redress the two typographical errors identified by the reviewers that appeared in the equations on line 277 and on line 306, and provide further explanation to equations that gave pause to the reviewers.

      Typographical errors: 

      The first typographical error in the main text regards equation 2 and will be corrected so that equation 2 appears correctly as…

      Line 277:  

      The second typo regards the definition of the considered pursuit’s reward rate, and will be corrected to appear as…

      Line 306:   

      Regarding equations:

      Cross-reference to equations in the main text refer to equations as they appear in the main text.  Where needed, the appendix in which they are derived is also given.   Equation numbering within the appendices refer to equations as they appear in the appendices.  In the revision, we will refer to all equations that appear in the appendices as Ap.#.#. so as to avoid confusion between referencing equations as they appear in the main text and equations as they appear in the appendices.  

      We would also like to clarify that equation 8, , as we derive, is not new, as it is similarly derived and expressed in prior foundational work by McNamara (1982), which is now so properly attributed. 

      Equation 1 and Appendix 1

      Equation 1 is formulated to calculate the average reward received and average time spent per unit time spent in the default pursuit. So, fi is the encounter rate of pursuit  for one unit of time spent in the default pursuit (lines 259-262). Added to the summation in the numerator, we have the average reward obtained in the default pursuit per unit time and in the denominator we have the time spent in the default pursuit per unit time (1).

      Equation 2 and Appendix 2

      Eq. 2.4 in Appendix 2 calculates the average time spent outside of the considered pursuit, per encounter with the considered pursuit. Breaking down eq. 2.4, the first term in the numerator,

      gives the expected time spent in other pursuits, per unit time spent in the default pursuit, where fi is the encounter rate of pursuit  per unit time spent in the default pursuit, and  is the time required by pursuit i. The second term in the numerator, (1, added outside the summation) simply represents the unit of time spent in the default pursuit, over which the encounter rate of each pursuit is calculated. Together, these represent the total time spent outside the considered pursuit, per unit time spent in the default pursuit. The denominator,

      is the frequency with which the considered pursuit is encountered per unit time spent in the default pursuit, so

      is the average time spent within the default pursuit, per encounter with the considered pursuit. By multiplying the average time spent outside of the considered pursuit per unit time spent in the default pursuit by the average time spent within the default pursuit per encounter with the considered pursuit, we get eq. 2.4, the average time spent outside of the considered pursuit, per encounter with the considered pursuit, which is equal to tout.

                             (eq. 2.4)

    1. Author response:

      To Reviewer #1:

      Thank you for your thorough review and comments on our work, which you described as “the role of neuritin in T cell biology studied here is new and interesting.”.  We have summarized your comments into two categories: biology and investigation approach, experimental rigor, and data presentation.

      Biology and Investigation approach comments:

      (1) Questions regarding the T cell anergy model:

      Major point “(4) Figure 1E-H. The authors assume that this immunization protocol induces anergic cells, but they provide no experimental evidence for this. It would be useful to show that T cells are indeed anergic in this model, especially those that are OVA-specific. The lack of IL-2 production by Cltr cells could be explained by the presence of fewer OVA-specific cells, rather than by an anergic status.”

      T cell anergy is a well-established concept first described by Schwartz’s group. It refers to the hyporesponsive T cell functional state in antigen-experienced CD4 T cells (Chappert and Schwartz, 2010; Fathman and Lineberry, 2007; Jenkins and Schwartz, 1987; Quill and Schwartz, 1987).  Anergic T cells are characterized by their inability to expand and to produce IL2 upon subsequent antigen re-challenge. In this paper, we have borrowed the existing in vivo T cell anergy induction model used by Mueller’s group for T cell anergy induction (Vanasek et al., 2006).  Specifically, Thy1.1+ Ctrl or Nrn1-/- TCR transgenic OTII cells were co-transferred with the congenically marked Thy1.2+ WT polyclonal Treg cells into TCR-/- mice.  After anergy induction, the congenically marked TCR transgenic T cells were recovered by sorting based on Thy1.1+ congenic marker, and subsequently re-stimulation ex vivo with OVA323-339 peptide. We evaluated the T cell anergic state based on OTII cell expansion in vivo and IL2 production upon OVA323-339 restimulation ex vivo.  

      “The authors assume that this immunization protocol induces anergic cells, but they provide no experimental evidence for this.”

      Because the anergy model by Mueller's group is well established (Vanasek et al., 2006), we did not feel that additional effort was required to validate this model as the reviewer suggested. Moreover, the limited IL2 production among the control cells upon restimulation confirms the validity of this model.

      “The lack of IL-2 production by Cltr cells could be explained by the presence of fewer OVAspecific cells, rather than by an anergic status”.

      Cells from Ctrl and Nrn1-/- mice on a homogeneous TCR transgenic (OTII) background were used in these experiments. The possibility that substantial variability of TCR expression or different expression levels of the transgenic TCR could have impacted IL2 production rather than anergy induction is unlikely.

      Overall, we used this in vivo anergy model to evaluate the Nrn1-/- T cell functional state in comparison to Ctrl cells under the anergy induction condition following the evaluation of Nrn1 expression, particularly in anergic T cells.  Through studies using this anergy model, we observed a significant change in Treg induction among OTII cells. We decided to pursue the role of Nrn1 in Treg cell development and function rather than the biology of T cell anergy as evidenced by subsequent experiments.

      Minor points “(6) On which markers are anergic cells sorted for RNAseq analysis?”

      Cells were sorted out based on their congenic marker marking Ctrl or Nrn1-/- OTII cells transferred into the host mice.  We did not specifically isolate anergic cells for sequencing.

      (2) Question regarding the validity of iTreg differentiation model.

      Major point: “(5) Figure 2A-C and Figure 3. The use of iTregs to try to understand what is happening in vivo is problematic. iTregs are cells that have probably no equivalent in vivo, and so may have no physiological relevance. In any case, they are different from pTreg cells generated in vivo. Working with pTreg may be challenging, that is why I would suggest generating data with purified nTreg. Moreover, it was shown in the article of Gonzalez-Figueroa 2021 that Nrn1-/- nTreg retained a normal suppressive function, which would not be what is concluded by the authors of this manuscript. Moreover, we do not even know what the % of Foxp3 cells is in the iTreg used (after differentiation and 20h of re-stimulation) and whether this % is the same between Ctlr and Nrn1 KO cells.”.

      We thank Reviewer #1 for their feedback. While it is true that iTregs made in vitro and in vivo generated pTregs display several distinctions (e. g., differences in Foxp3 expression stability, for example), we strongly disagree with this statement by Revieweer#1 “The use of iTregs to try to understand what is happening in vivo is problematic. iTregs are cells that have probably no equivalent in vivo, and so may have no physiological relevance.” The induced Treg cell (iTreg) model was established over 20 years ago (Chen et al., 2003; Zheng et al., 2002), and the model is widely adopted with over 2000 citations. Further, it has been instrumental in understanding different aspects of regulatory T cell biology (Hurrell et al., 2022; John et al., 2022; Schmitt and Williams, 2013; Sugiura et al., 2022).   

      Because we have observed reduced pTreg generation in vivo, we choose to use the in vitro iTreg model system to understand the mechanistic changes involved in Treg cell differentiation and function, specifically, neuritin’s role in this process. We have made no claim that iTreg cell biology is identical to pTreg generated in vivo or nTreg cells. However, the iTreg culture system has proved to be a good in vitro system for deciphering molecular events involved in complex processes. As such, it remains a commonly used approach by many research groups in the Treg cell field (Hurrell et al., 2022; John et al., 2022; Sugiura et al., 2022). Moreover, applying the iTreg in vitro culture system has been instrumental in helping us identify the cell electrical state change in Nrn1-/- CD4 cells and revealed the biological link between Nrn1 and the ionotropic AMPA receptor (AMPAR), which we will discuss in the subsequent discussion. It is technically challenging to use nTreg cells for T cell electrical state studies due to their heterogeneous nature from development in an in vivo environment and the effect of manipulation during the nTreg cell isolation process, which can both affect the T cell electrical state.   

      “Moreover, it was shown in the article of Gonzalez-Figueroa 2021 that Nrn1-/- nTreg retained a normal suppressive function, which would not be what is concluded by the authors of this manuscript.” 

      We have also carried out nTreg studies in vitro in addition to iTreg cells. Similar to Gonzalez-Figueroa et al.'s findings, we did not observe differences in suppression function between Nrn1-/- and WT nTreg using the in vitro suppression assay. However, Nrn1-/- nTreg cells revealed reduced suppression function in vivo (Fig. 2D-L). In fact, Gonzalez-Figueroa et al. observed reduced plasma cell formation after OVA immunization in Treg-specific Nrn1-/- mice, implicating reduced suppression from Nrn1-/- follicular regulatory T (Tfr) cells. Thus, our observation of the reduced suppression function of Nrn1-/- nTreg toward effector T cell expansion, as presented in Fig. 2D-L, does not contradict the results from Gonzalez-Figueroa et al. Rather, the conclusions of these two studies agree that Nrn1 can play important roles in immune suppression observable in vivo that are not captured readily by the in vitro suppression assay.

      “Moreover, we do not even know what the % of Foxp3 cells is in the iTreg used (after differentiation and 20h of re-stimulation) and whether this % is the same between Ctlr and Nrn1 KO cells.”

      We have stated in the manuscript on page 7 line 208 that “Similar proportions of Foxp3+ cells were observed in Nrn1-/- and Ctrl cells under the iTreg culture condition, suggesting that Nrn1 deficiency does not significantly impact Foxp3+ cell differentiation”. In the revised manuscript, we will include the data on the proportion of Foxp3+ cells before iTreg restimulation.

      (3) Confirmation of transcriptomic data regarding amino acids or electrolytes transport change

      Minor point“(3) Would not it be possible to perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane? This would be a more interesting demonstration than transcriptomic data.”

      We appreciate Review# 1’s suggestion regarding “perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane”.  We have indeed already performed such experiments corroborating the transcriptomics data on differential amino acid and nutrient transporter expression. Specifically, we loaded either iTreg or Th0 cells with membrane potential (MP) dye and measured MP level change after adding the complete set of amino acids (complete AA).  Upon entry, the charge carried by AAs may transiently affect cell membrane potential. Different AA transporter expression patterns may show different MP change patterns upon AA entry, as we showed in Author response image 1. We observed reduced MP change in Nrn1-/- iTreg compared to the Ctrl, whereas in the context of Th0 cells, Nrn1-/- showed enhanced MP change than the Ctrl. We can certainly include these data in the revised manuscript.

      Author response image 1.

      Membrane potential change induced by amino acids entry. a. Nrn1-/- or WT iTreg cells loaded with MP dye and MP change was measured upon the addition of a complete set of AAs. b. Nrn1-/- or WT Th0 cells loaded with MP dye and MP change was measured upon the addition of a complete set of AAs.

      (4) EAE experiment data assessment

      Minor point ”(5) Figure 5F. How are cells re-stimulated? If polyclonal stimulation is used, the experiment is not interesting because the analysis is done with lymph node cells. This analysis should either be performed with cells from the CNS or with MOG restimulation with lymph node cells.”

      In the EAE study, the Nrn1-/- mice exhibit similar disease onset but a protracted non-resolving disease phenotype compared to the WT control mice.  Several reasons may contribute to this phenotype: 1. Enhanced T effector cell infiltration/persistence in the central nervous system (CNS); 2. Reduced Treg cell-mediated suppression to the T effector cells in the CNS; 3. Protracted non-resolving inflammation at the immunization site has the potential to continue sending T effector cells into CNS, contributing to persistent inflammation. Based on this reasoning, we examined the infiltrating T effector cell number and Treg cell proportion in the CNS.  We also restimulated cells from draining lymph nodes close to the inflammation site, looking for evidence of persistent inflammation.  When mice were harvested around day 16 after immunization, the inflammation at the local draining lymph node should be at the contraction stage.  We stimulated cells with PMA and ionomycin intended to observe all potential T effector cells involved in the draining lymph node rather than only MOG antigen-specific cells.  We disagree with Reviewer #1’s assumption that “This analysis should either be performed with cells from the CNS or with MOG restimulation with lymph node cells.”. We think the experimental approach we have taken has been appropriately tailored to the biological questions we intended to answer.

      Experimental rigor and data presentation.

      (1) Data labeling and additional supporting data

      Major points (2) The authors use Nrn1+/+ and Nrn1+/- cells indiscriminately as control cells on the basis of similar biology between Nrn1+/+ and Nrn1+/- cells at homeostasis. However, it is quite possible that the Nrn1+/- cells have a phenotype in situations of in vitro activation or in vivo inflammation (cancer, EAE). It would be important to discriminate Nrn1+/- and Nrn1+/+ cells in the data or to show that both cell types have the same phenotype in these conditions too.

      (3) Figure 1A-D. Since the authors are using the Nrp1 KO mice, it would be important to confirm the specificity of the anti-Nrn1 mAb by FACS. Once verified, it would be important to add FACS results with this mAb in Figures 1A-C to have single-cell and quantitative data as well.

      Minor points  

      (1) Line 119, 120 of the text. It is said that one of the most up-regulated genes in anergic cells is Nrn1 but the data is not shown.

      (2) For all figures showing %, the titles of the Y axes are written in an odd way. For example, it is written "Foxp3% CD4". It would be more conventional and clearer to write "% Foxp3+ / CD4+" or "% Foxp3+ among CD4+".

      (4) For certain staining (Figure 3E, H) it would be important to show the raw data, in addition to MFI or % values.

      We can adapt the labeling and provide additional data, including Nrn1 staining on Treg cells and flow graphs for pmTOR and pS6 staining (Fig. 3H), as requested by Reviewer #1.

      (2) Experimental rigor:

      General comments:

      “However, it is disappointing that reading this manuscript leaves an impression of incomplete work done too quickly.”

      We were discouraged to receive the comment, “this manuscript leaves an impression of incomplete work done too quickly.” Our study of this novel molecule began without any existing biological tools such as antibodies, knockout mice, etc.  Over the past several years, we have established our own antibodies for Nrn1 detection, obtained and characterized Nrn1 knockout mice, and utilized multiple approaches to identify the molecular mechanism of Nrn1 function. Through the use of the in vitro iTreg system described in this manuscript, we identified the association of Nrn1 deficiency with cell electrical state change, potentially connected to AMPAR function. We have further corroborated our findings by generating Nrn1 and AMPAR T cell specific double knockout mice and confirmed that T cell specific AMPAR deletion could abrogate the phenotype caused by the Nrn1 deficiency (see Author response image 2).  We did not include the double knockout data in the current manuscript because AMPAR function has not yet been studied thoroughly in T cell biology, and we feel this topic warrants examination in its own right.  However, the unpublished data support the finding that Nrn1 modulates the T cell electrical state and, consequently, metabolism, ultimately influencing tolerance and immunity.  In its current form, the manuscript represents the first characterization of the novel molecule Nrn1 in anergic cells, Tregs, and effector T cells. While this work has led to several exciting additional questions, we disagree that the novel characterization we have presented Is incomplete. We feel that our present data set, which squarely highlights Nrn1’s role as an important immune regulator while shedding unprecedented light on the molecular events involved, will be of considerable interest to a broad field of researchers.

      “Multiple models have been used, but none has been studied thoroughly enough to provide really conclusive and unambiguous data. For example, 5 different models were used to study T cells in vivo. It would have been preferable to use fewer, but to go further in the study of mechanisms.”

      We have indeed used multiple in vivo models to reveal Nrn1's function in Treg differentiation, Treg suppression function, T effector cell differentiation and function, and the overall impact on autoimmune disease. Because the impact of ion channel function is often context-dependent, we examined the biological outcome of Nrn1 deficiency in several in vivo contexts.  We would appreciate it if Reviewer#1 would provide a specific example, given the Nrn1 phenotype, of how to proceed deeper to investigate the electrical change in the in vivo models.

      “Major points (1) A real weakness of this work is the fact that in most of the results shown, there are few biological replicates with differences that are often small between Ctrl and Nrn1 -/-. The systematic use of student's t-test may lead to thinking that the differences are significant, which is often misleading given the small number of samples, which makes it impossible to know whether the distributions are Gaussian and whether a parametric test can be used. RNAseq bulk data are based on biological duplicates, which is open to criticism.”

      We respectfully disagree with Reviewer #1 on the question of statistical power and significance to our work. We have used 5-8 mice/group for each in vivo model and 3-4 technical replicates for the in vitro studies, with a minimum of 2-3 replicate experiments. These group sizes and replication numbers are in line with those seen in high-impact publications. While some differences between Ctrl and Nrn1-/- appear small, they have significant biological consequences, as evidenced by the various Nrn1-/- in vivo phenotypes. Furthermore, we believe we have subjected our data to the appropriate statistical tests to ensure rigorous analysis and representation of our findings.

      To Reviewer #2.

      We thank Reviewer #2 for the careful review of the manuscript. We especially appreciate the comments that “The characterizations of T cell Nrn1 expression both in vitro and in vivo are comprehensive and convincing. The in vivo functional studies of anergy development, Treg suppression, and EAE development are also well done to strengthen the notion that Nrn1 is an important regulator of CD4 responsiveness.”

      “The major weakness of this study stems from a lack of a clear molecular mechanism involving Nrn1. “  

      We fully understand this comment from Reviewer #2. The main mechanism we identified contributing to the functional defect of Nrn1-/- T cells involves novel effects on the electric and metabolic state of the cells. Although we referenced neuronal studies that indicate Nrn1 is the auxiliary protein for the ionotropic AMPA-type glutamate receptor (AMPAR) and may affect AMPAR function, we did not provide any evidence in this manuscript as the topic requires further in-depth study.   

      For the benefit of this discussion, we include our preliminary Nrn1 and AMPAR double knockout data (Author response image 2), which indicates that abrogating AMPAR expression can compensate for the defect caused by Nrn1 deficiency in vitro and in vivo. This preliminary data supports the notion that Nrn1 modulates AMPAR function, which causes changes in T cell electric and metabolic state, influencing T cell differentiation and function.  

      Author response image 2.

      Deletion of AMPAR expression in T cells compensates for the defect caused by Nrn1 deficiency. Nrn1-/- mice were crossed with T cell-specific AMPAR knockout mice (AMPARfl/flCD4Cre+) mice. The following mice were generated and used in the experiment: T cell specific AMPAR-knockout and Nrn1 knockout mice (AKONKO), Nrn1 knockout mice (AWTNKO), Ctrl mice (AWTNWT). a. Deletion of AMPAR compensates for the iTreg cell defect observed in Nrn1-/- CD4 cells. iTreg live cell proportion, cell number, and Ki67 expression among Foxp3+ cells 3 days after aCD3 restimulation. b. Deletion of AMPAR in T cells abrogates the enhanced autoimmune response in Nrn1-/- Mouse in the EAE disease model. Mouse relative weight change and disease score progression after EAE disease induction.  

      Ion channels can influence cell metabolism through multiple means (Vaeth and Feske, 2018; Wang et al., 2020). First, ion channels are involved in maintaining cell resting membrane potential. This electrical potential difference across the cell membrane is essential for various cellular processes, including metabolism (Abdul Kadir et al., 2018; Blackiston et al., 2009; Nagy et al., 2018; Yu et al., 2022). Second, ion channels facilitate the movement of ions across cell membranes. These ions are essential for various metabolic processes. For example, ions like calcium (Ca2+), potassium (K+), and sodium (Na+) play crucial roles in signaling pathways that regulate metabolism (Kahlfuss et al., 2020). Third, ion channel activity can influence cellular energy balance due to ATP consumption associated with ion transport to maintain ion balances (Erecińska and Dagani, 1990; Gerkau et al., 2019). This, in turn, can impact processes like ATP production, which is central to cellular metabolism. Thus, ion channel expression and function determine the cell’s bioelectric state and contribute to cell metabolism (Levin, 2021).

      Because the AMPAR function has not been thoroughly studied using a genetic approach in T cells, we do not intend to include the double knockout data in this manuscript before fully characterizing the T cell-specific AMPAR knockout mice.  

      “Although the biochemical and informatics studies are well-performed, it is my opinion that these results are inconclusive in part due to the absence of key "naive" control groups. This limits my ability to understand the significance of these data.

      Specifically, studies of the electrical and metabolic state of Nrn1-/- inducible Treg cells (iTregs) would benefit from similar data collected from wild-type and Nrn1-/- naive CD4 T cells.”

      We appreciate the reviewer’s comments. This comment reflects two concerns in data interpretation:

      (1) Are Nrn1-/- naïve T cells fundamentally different from WT cells? Does this fundamental difference contribute to the observed electrical and metabolic phenotype in iTreg or Th0 cells? This is a very good question we will perform the experiments as the reviewer suggested. While Nrn1 is expressed at a basal (low) level in naïve T cells, deletion of Nrn1 may cause changes in naïve T cell phenotype.   

      (2) Is the Nrn1-/- phenotype caused by Nrn1 functional deficiency or due to the secondary effect of Nrn1 deletion, such as non-physiological cell membrane structure changes?

      We have done the following experiment to address this concern.  We have cultured WT T cells in the presence of Nrn1 antibody and compared the outcome with Nrn1-/- iTreg cells (Author response image 3). WT iTreg cells under antibody blockade exhibited similar changes as Nrn1-/- iTreg cells, confirming the physiological relevance of the Nrn1-/- phenotype.

      Author response image 3.

      Nrn1 antibody blockade in WT iTreg cell culture caused similar phenotypic change as in Nrn1-/- iTreg cells. Nrn1-/- and WT CD4 cells were differentiated under iTreg condition in the presence of anti-Nrn1 (aNrn1) antibody or isotype control for 3 days. Cells were restimulated with anti-CD3 and in the presence of aNrn1 or isotype. a. MP measured 18hr after anti-CD3 restimulation. b. live CD4 cell number and proportion of Ki67 expression among live cells three days after restimulation. c. The proportion of Foxp3+ cells among live cells three days after restimulation.  

      Reference:

      Abdul Kadir, L., M. Stacey, and R. Barrett-Jolley. 2018. Emerging Roles of the Membrane Potential: Action Beyond the Action Potential. Front Physiol 9:1661.

      Blackiston, D.J., K.A. McLaughlin, and M. Levin. 2009. Bioelectric controls of cell proliferation: ion channels, membrane voltage and the cell cycle. Cell Cycle 8:3527-3536.

      Chappert, P., and R.H. Schwartz. 2010. Induction of T cell anergy: integration of environmental cues and infectious tolerance. Current opinion in immunology 22:552-559.

      Chen, W., W. Jin, N. Hardegen, K.J. Lei, L. Li, N. Marinos, G. McGrady, and S.M. Wahl. 2003. Conversion of peripheral CD4+CD25- naive T cells to CD4+CD25+ regulatory T cells by TGF-beta induction of transcription factor Foxp3. The Journal of experimental medicine 198:1875-1886.

      Erecińska, M., and F. Dagani. 1990. Relationships between the neuronal sodium/potassium pump and energy metabolism. Effects of K+, Na+, and adenosine triphosphate in isolated brain synaptosomes. J Gen Physiol 95:591-616.

      Fathman, C.G., and N.B. Lineberry. 2007. Molecular mechanisms of CD4+ T-cell anergy. Nat Rev Immunol 7:599-609.

      Gerkau, N.J., R. Lerchundi, J.S.E. Nelson, M. Lantermann, J. Meyer, J. Hirrlinger, and C.R. Rose. 2019. Relation between activity-induced intracellular sodium transients and ATP dynamics in mouse hippocampal neurons. The Journal of physiology 597:5687-5705.

      Hurrell, B.P., D.G. Helou, E. Howard, J.D. Painter, P. Shafiei-Jahani, A.H. Sharpe, and O. Akbari. 2022. PD-L2 controls peripherally induced regulatory T cells by maintaining metabolic activity and Foxp3 stability. Nature communications 13:5118.

      Jenkins, M.K., and R.H. Schwartz. 1987. Antigen presentation by chemically modified splenocytes induces antigen-specific T cell unresponsiveness in vitro and in vivo. The Journal of experimental medicine 165:302-319.

      John, P., M.C. Pulanco, P.M. Galbo, Jr., Y. Wei, K.C. Ohaegbulam, D. Zheng, and X. Zang. 2022. The immune checkpoint B7x expands tumor-infiltrating Tregs and promotes resistance to anti-CTLA-4 therapy. Nature communications 13:2506.

      Kahlfuss, S., U. Kaufmann, A.R. Concepcion, L. Noyer, D. Raphael, M. Vaeth, J. Yang, P. Pancholi, M. Maus, J. Muller, L. Kozhaya, A. Khodadadi-Jamayran, Z. Sun, P. Shaw, D. Unutmaz, P.B. Stathopulos, C. Feist, S.B. Cameron, S.E. Turvey, and S. Feske. 2020. STIM1-mediated calcium influx controls antifungal immunity and the metabolic function of nonpathogenic Th17 cells. EMBO molecular medicine 12:e11592.

      Levin, M. 2021. Bioelectric signaling: Reprogrammable circuits underlying embryogenesis, regeneration, and cancer. Cell 184:1971-1989.

      Nagy, E., G. Mocsar, V. Sebestyen, J. Volko, F. Papp, K. Toth, S. Damjanovich, G. Panyi, T.A. Waldmann, A. Bodnar, and G. Vamosi. 2018. Membrane Potential Distinctly Modulates Mobility and Signaling of IL-2 and IL-15 Receptors in T Cells. Biophys J 114:2473-2482.

      Quill, H., and R.H. Schwartz. 1987. Stimulation of normal inducer T cell clones with antigen presented by purified Ia molecules in planar lipid membranes: specific induction of a long-lived state of proliferative nonresponsiveness. Journal of immunology (Baltimore, Md. : 1950) 138:3704-3712.

      Schmitt, E.G., and C.B. Williams. 2013. Generation and function of induced regulatory T cells. Frontiers in immunology 4:152.

      Sugiura, A., G. Andrejeva, K. Voss, D.R. Heintzman, X. Xu, M.Z. Madden, X. Ye, K.L. Beier, N.U. Chowdhury, M.M. Wolf, A.C. Young, D.L. Greenwood, A.E. Sewell, S.K. Shahi, S.N. Freedman, A.M. Cameron, P. Foerch, T. Bourne, J.C. Garcia-Canaveras, J. Karijolich, D.C. Newcomb, A.K. Mangalam, J.D. Rabinowitz, and J.C. Rathmell. 2022. MTHFD2 is a metabolic checkpoint controlling effector and regulatory T cell fate and function. Immunity 55:65-81.e69.

      Vaeth, M., and S. Feske. 2018. Ion channelopathies of the immune system. Current opinion in immunology 52:39-50.

      Vanasek, T.L., S.L. Nandiwada, M.K. Jenkins, and D.L. Mueller. 2006. CD25+Foxp3+ regulatory T cells facilitate CD4+ T cell clonal anergy induction during the recovery from lymphopenia. Journal of immunology (Baltimore, Md. :1950) 176:5880-5889.

      Wang, Y., A. Tao, M. Vaeth, and S. Feske. 2020. Calcium regulation of T cell metabolism. Current opinion in physiology 17:207-223.

      Yu, W., Z. Wang, X. Yu, Y. Zhao, Z. Xie, K. Zhang, Z. Chi, S. Chen, T. Xu, D. Jiang, X. Guo, M. Li, J. Zhang, H. Fang, D. Yang, Y. Guo, X. Yang, X. Zhang, Y. Wu, W. Yang, and D. Wang. 2022. Kir2.1-mediated membrane potential promotes nutrient acquisition and inflammation through regulation of nutrient transporters. Nature communications 13:3544.

      Zheng, S.G., J.D. Gray, K. Ohtsuka, S. Yamagiwa, and D.A. Horwitz. 2002. Generation ex vivo of TGF-beta-producing regulatory T cells from CD4+CD25- precursors. Journal of immunology (Baltimore, Md. : 1950) 169:4183-4189.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Loh and colleagues investigate valence encoding in the mesolimbic dopamine system. Using an elegant approach, they show that sucrose, which normally evokes strong dopamine neuron activity and release in the nucleus accumbens, is made aversive via conditioned taste aversion, the same sucrose stimulus later evokes much less dopamine neuron activity and release. Thus, dopamine activity can dynamically track the changing valence of an unconditioned stimulus. These results are important for helping clarify valence and value related questions that are the matter of ongoing debate regarding dopamine functions in the field.

      Strengths:

      This is an elegant way to ask this question, the within subject's design and the continuity of the stimulus is a strong way to remove a lot of the common confounds that make it difficult to interpret valence-related questions. I think these are valuable studies that help tie up questions in the field while also setting up a number of interesting future directions. There are number of control experiments and tweaks to the design that help eliminate a number of competing hypotheses regarding the results. The data are clearly presented and contextualized.

      Weaknesses for consideration:

      The focus on one relatively understudied region of the rat striatum for dopamine recordings could potentially limit generalization of the findings. While this can be determined in future studies, the implications should be further discussed in the current manuscript.

      We agree that the manuscript would benefit from providing a stronger rationale for our recording sites and acknowledging the potential for regional differences in dopamine signaling. We have made the following additions to the manuscript:

      Added to the Discussion: “Recordings were targeted to the lateral VTA and the corresponding approximate terminal site in the NAc lateral shell (Lammel et al., 2008). Subregional differences in dopamine activity likely contribute to mixed findings on dopamine and affect. For example, dopamine in the NAc lateral shell differentially encodes cues predictive of rewarding sucrose and aversive footshock, which is distinct from NAc medial shell dopamine responses (de Jong et al., 2019). Our findings are similar to prior work from our group targeting recordings to the NAc dorsomedial shell (Hsu et al., 2020; McCutcheon et al., 2012; Roitman et al., 2008): there, intraoral sucrose increased NAc dopamine release while the response in the same rats to quinine was significantly lower.”

      Reviewer #2 (Public review):

      Summary:

      Koh et al. report an interesting manuscript studying dopamine binding in the lateral accumbens shell of rats across the course of conditioned taste aversion. The question being asked here is how does the dopamine system respond to aversion? The authors take advantage of unique properties of taste aversion learning (notably, within-subjects remapping of valence to the same physical stimulus) to address this.

      They combine a well controlled behavioural design (including key, unpaired controls) with fibre photometry of dopamine binding via GrabDA and of dopamine neuron activity by gCaMP, careful analyses of behaviour (e.g., head movements; home cage ingestion), the authors show that, 1) conditioned taste aversion of sucrose suppresses the activity of VTA dopamine neurons and lateral shell dopamine binding to subsequent presentations of the sucrose tastant; 2) this pattern of activity was similar to the innately aversive tastant quinine; 3) dopamine responses were negatively correlated with behavioural (inferred taste reactivity) reactivity; and 4) dopamine responses tracked the contingency of between sucrose and illness because these responses recovered across extinction of the conditioned taste aversion.

      Strengths:

      There are important strengths here. The use of a well-controlled design, the measurement of both dopamine binding and VTA dopamine neuron activity, the inclusion of an extinction manipulation; and the thorough reporting of the data. I was not especially surprised by these results, but these data are a potentially important piece of the dopamine puzzle (e.g., as the authors note, salience-based argument struggles to explain these data).

      Weaknesses for consideration:

      (1) The focus here is on the lateral shell. This is a poorly investigated region in the context of the questions being asked here. Indeed, I suspect many readers might expect a focus on the medial shell. So, I think this focus is important. But, I think it does warrant greater attention in both the introduction and discussion. We do know from past work that there can be extensive compartmentalisation of dopamine responses to appetitive and aversive events and many of the inconsistent findings in the literature can be reconciled by careful examination of where dopamine is assessed. I do think readers would benefit from acknowledgement this - for example it is entirely reasonable to suppose that the findings here may be specific to the lateral shell.

      As with our response to Reviewer 1, we agree that we should provide further rationale for focusing our recordings on the lateral shell and acknowledge potential differences in dopamine dynamics across NAc subregions. In addition to the changes in the Discussion detailed in our response to Reviewer 1, we have made the following additions to the Introduction:

      Added to the Introduction: “NAc lateral shell dopamine differentially encodes cues predictive of rewarding (i.e., sipper spout with sucrose) and aversive stimuli (i.e., footshock), which is distinct from other subregions (de Jong et al., 2019). It is important to note that other regions of the NAc may serve as hedonic hotspots (e.g. dorsomedial shell; or may more closely align with the signaling of salience (e.g. ventromedial shell; (Yuan et al., 2021)).”

      (2) Relatedly, I think readers would benefit from an explicit rationale for studying the lateral shell as well as consideration of this in the discussion. We know that there are anatomical (PMID: 17574681), functional (PMID: 10357457), and cellular (PMID: 7906426) differences between the lateral shell and the rest of the ventral striatum. Critically, we know that profiles of dopamine binding during ingestive behaviours there can be highly dissimilar to the rest of ventral striatum (PMID: 32669355). I do think these points are worth considering.

      There are several reasons why dopamine dynamics were recorded in the NAc lateral shell:

      (1) Dopamine neurons in more medial aspects of the VTA preferentially target the NAc medial shell and core whereas dopamine neurons in the lateral VTA – our target for VTA DA recordings – project to the lateral shell of the NAc (Lammel et al., 2008). Thus, our goal was to sample NAc release dynamics in areas that receive projections from our cell body recording sites.

      (2) Cues predictive of reward availability (i.e., sipper spout with sucrose) and aversive stimuli (i.e., footshock) are differentially encoded by NAc lateral shell dopamine, which is distinct from NAc ventromedial shell dopamine responses (de Jong et al., 2019). These findings suggest a role for NAc lateral shell dopamine in the encoding of a stimulus’s valence, which made the subregion an area of interest for further examination.

      (3) With respect to the medial NAc shell specifically, extensive literature had already shown it to be a ‘hedonic hotspot’ (Morales and Berridge, 2020; Yuan et al., 2021) whereas the ventral portion is more mixed with respect to valence (Yuan et al., 2021). We had previously shown that intraoral infusions of primary taste stimuli of opposing valence (i.e., sucrose and quinine) evoke differential responses in dopamine release within the NAc dorsomedial shell (Roitman et al., 2008). We more recently replicated differential dopamine responses from dopamine cell bodies in the lateral VTA (Hsu et al., 2020) and thus endeavored to the possibility of changing dopamine responses in the lateral VTA to the same stimulus as its valence changes. As a result of these choices, measuring dopamine release in the lateral shell was a logical choice. The field would greatly benefit from continued future work surveying the entirety of the VTA DA projection terminus. 

      We have included these points of justification in the Introduction and Discussion sections.

      (3) I found the data to be very thoughtfully analysed. But in places I was somewhat unsure:

      (a) Please indicate clearly in the text when photometry data show averages across trials versus when they show averages across animals.

      We have now explicitly indicated in the figure legends of Figures 1, 3, 5, 7, and 8:

      (1) In heat maps, each row represents the averaged (across rats) response on that trial.

      (2) Traces below heat maps represent the response to infusion averaged first across trials for each rat and then across all rats.

      (3) Insets represent the average z-score across the infusion period averaged first across all trials for each rat and then across all rats.

      (b) I did struggle with the correlation analyses, for two reasons.

      (i) First, the key finding here is that the dopamine response to intraoral sucrose is suppressed by taste aversion. So, this will significantly restrict the range of dopamine transients, making interpretation of the correlations difficult.

      The overall hypothesis is that the dopamine response would correlate with the valence of a taste stimulus – even and especially when the stimulus remained constant but its valence changed. We inferred valence from the behavioral reactivity to the stimulus – reasoning that an appetitive taste will evoke minimal movement of the nose and paws (presumably because the animals are primarily engaging in small mouth movements associated with ingestion as shown by the seminal work of Grill and Norgren (1978) and the many studies published by the K.C. Berridge group) whereas an aversive taste will evoke significantly more movement as the rats engage in rejection responses (e.g. forelimb flails, chin rubs, etc.). When we conducted our regression analyses we endeavored to be as transparent as possible and labeled each symbol based on group (Unpaired vs Paired) and day (Conditioning vs Test). Both behavioral reactivity and dopamine responses change – but only for the Paired rats across days. In this sense, we believe the interpretation is clear. However, the Reviewer raises an important criticism that there would essentially be a floor effect with dopamine responses. We believe this is mitigated by data acquired across extinction and especially in Figure 9B. Here, the observations that dopamine responses fall to near zero but return to pre-conditioning levels in the Paired group with strong correlation between dopamine and behavioral reactivity throughout would hopefully partially allay the Reviewer’s concerns. See Part ii below for further support.

      (ii) Second, the authors report correlations by combining data across groups/conditions. I understand why the authors have done this, but it does risk obscuring differences between the groups. So, my question is: what happens to this trend when the correlations are computed separately for each group? I suspect other readers will share the same question. I think reporting these separate correlations would be very helpful for the field -

      regardless of the outcome.

      To address this concern, we performed separate regression analyses for Paired and Unpaired rats and provide the table below to detail results where data were combined across groups or separated. Expectedly, all analyses in Paired rats indicated a significant inverse relationship between dopamine and behavioral reactivity. Afterall, it is only in this group where behavioral reactivity to the taste stimulus changes as function of conditioning. Perhaps even more striking is that in almost all comparisons, even when restricting the regression analysis to Unpaired rats, we still observed a significant inverse relationship between dopamine and behavioral reactivity in most experiments. We have outlined the separated correlations below (asterisks denote slopes significantly different from 0; * p<0.05; ** p<0.01; *** p<0.005; **** p<0.001):

      Author response table 1.

      (4) Figure 1A is not as helpful as it might be. I do think readers would expect a more precise reporting of GCaMP expression in TH+ and TH- neurons. I also note that many of the nuances in terms of compartmentalisation of dopamine signalling discussed above apply to ventral tegmental area dopamine neurons (e.g. medial v lateral) and this is worth acknowledging when interpreting t

      Others have reported (Choi et al., 2020) and quantified (Hsu et al., 2020) GCaMP6f expression in TH+ neurons. While we didn’t report these quantifications, our observations were very much in line with previous quantifications from our laboratory (Hsu et al. 2020).

      We agree that we should elaborate on VTA subregional differences and have answered this response above (See responses to Reviewer 1 Weakness #1 and Reviewer 2 Weakness #2).

      Reviewer #3 (Public review):

      Summary:

      This study helps to clarify the mixed literature on dopamine responses to aversive stimuli. While it is well accepted that dopamine in the ventral striatum increases in response to various rewarding and appetitive stimuli, aversive stimuli have been shown to evoke phasic increases or decreasing depending on the exact aversive stimuli, behavioral paradigm, and/or dopamine recording method and location examined. Here the authors use a well-designed set of experiments to show differential responses to an appetitive primary reward (sucrose) that later becomes a conditioned aversive stimulus (sucrose previously paired with lithium chloride in a conditioned taste aversion paradigm). The results are interesting and add valuable data to the question of how the mesolimbic dopamine system encodes aversive stimuli, however, the conclusions are strongly stated given that the current data do not necessarily align with prior conflicting data in terms of recording location, and it is not clear exactly how to interpret the generally biphasic dopamine response to the CTA-sucrose which also evolves over exposures within a single session.

      Strengths:

      • The authors nicely demonstrate that their two aversive stimuli examined, quinine and sucrose following CTA, evoked aversive facial expressions and paw movements that differed from those following rewarding sucrose to support that the stimuli experienced by the rats differ in valence.

      • Examined dopamine responses to the exact same sensory stimuli conditioned to have opposing valences, avoiding standard confounds of appetitive and aversive stimuli being sensed by different sensory modalities (i.e., sweet taste vs. electric shock)

      • The authors examined multiple measurements of dopamine activity - cell body calcium (GCaMP6f) in midbrain and release in NAc (Grab-DA2h), which is useful as the prior mixed literature on aversive dopamine responses comes from a variety of recording methods.

      • Correlations between sucrose preference and dopamine signals demonstrate behavioral relevance of the differential dopamine signals.

      • The delayed testing experiment in Figure 7 nicely controls for the effect of time to demonstrate that the "rewarding" dopamine response to sucrose only recovers after multiple extinction sucrose exposures to extinguish the CTA.

      Weaknesses for consideration:

      (1) Regional differences in dopamine signaling to aversive stimuli are mentioned in the introduction and discussion. For instance, the idea that dopamine encodes salience is strongly argued against in the discussion, but the paper cited as arguing for that (Kutlu et al. 2021) is recording from the medial core in mice. Given other papers cited in the text about the regional differences in dopamine signaling in the NAc and from different populations of dopamine neurons in midbrain, it's important to mention this distinction wrt to salience signaling. Relatedly, the text says that the lateral NAc shell was targeted for accumbens recordings, but the histology figure looks like the majority of fibers were in the anterior lateral core of NAc. For the current paper to be a convincing last word on the issue, it would be extremely helpful to have similar recordings done in other parts of the NAc to do a more thorough comparison against other studies.

      As the Reviewer notes, NAc dopamine recordings were aimed at the lateral NAc shell. It is possible that some dopamine neurons lying within the anterior lateral core were recorded. Fiber photometry and the size of the fiber optics cannot definitively identify the precise location and number of dopamine neurons from which we recorded. Still, recording sites did not systematically differ between groups. Further, the within-subjects design helps to mitigate any potential biases for one subregion over another. The results presented in the manuscript strongly support a valence code. It is difficult to be the ‘last word’ on this topic and we suspect debate will continue. We used taste stimuli for appetitive and aversive stimuli – whereas many in the field will continue to use other noxious stimuli (e.g. foot shock) that likely recruit different circuits en route to the VTA. And there may very well be a different regional profile for dopamine signaling with different noxious stimuli. Moreover, we used intraoral infusion to avoid confounds of stimulus avoidance and competing motivations (e.g. food or fluid deprivation). We believe that this is one of the most important and unique features of our report. Recent work supports a role for phasic increases in dopamine in avoidance of noxious stimuli (Jung et al., 2024) and it will be critical for the field to reflect on the differences between avoidance and aversion. Moreover, in ongoing studies we aspire to fully survey dopamine signaling in conditioned taste aversion across the medial-lateral and dorsal-ventral axes of the VTA and NAc.

      (2) Dopamine release in the NAc never dips below baseline for the conditioned sucrose. Is it possible to really consider this as a signal for valence per se, as opposed to it being a weaker response relative to the original sucrose response?

      Indeed, NAc dopamine release to intraoral quinine nor aversive sucrose doesn’t dip below baseline but rather dopamine binding doesn’t change from pre-infusion baseline levels. It should be noted that VTA dopamine cell body activity does indeed dip below baseline in response to aversive sucrose. Moreover, using fast-scan cyclic voltammetry, we showed that dopamine release dips below baseline in the NAc dorsomedial shell in response to intraoral quinine (Roitman et al., 2008). The differences across recording sites may reflect regional differences but they may also reflect differences in recording approaches. GrabDA2h, used here, has relatively slow kinetics that may obscure dips below baseline (see response Weakness# 8 below).

      (3) Related to this, the main measure of the dopamine signal here, "mean z-score," obscures the temporal dynamics of the aversive dopamine response across a trial. This measure is used to claim that sucrose after CTA is "suppressing" dopamine neuron activity and release, which is true relative to the positive valence sucrose response. However, both GRAB-DA and cell-body GCaMP measurements show clear increases after onset of sucrose infusion before dipping back to baseline or slightly below in the average of all example experiments displayed. One could point to these data to argue either that aversive stimuli cause phasic increases in dopamine (due to the initial increase) or decreases (due to the delayed dip below baseline) depending on the measurement window. Some discussion of the dynamics of the response and how it relates to the prior literature would be useful.

      We have used mean z-score to do much of our quantitative analyses but the Reviewer raises the intriguing possibility that we are masking an initial increase in dopamine release and VTA DA activity evoked by aversive taste by doing so. We included the heat maps in the manuscript to be as transparent as possible about the time course of dopamine responses – both within a trial and across trials. The Reviewer’s point prompted us to reflect further on the heat maps and recognize that trials early in the session often showed a brief increase in dopamine for aversive sucrose but this response dissipated (NAc dopamine release) or flipped (VTA DA cell body activity) over trials. We now quantitatively characterize this feature by looking at the timecourse of dopamine responses in each third of the trials (1-10, 11-20, 21-30; see Author response images 1,2 and 3). As we infer the valence of the stimulus from nose and paw movements (behavioral reactivity), it is especially striking that we a similar timecourse for changes in behavior. Collectively, the data may reflect an updating process that is relatively slow and requires experience of the stimulus in a new (aversive) state – that is, a model-free process. While our experiments were not designed to test the updating of dopamine responses and discern their participation in model-based versus model-free learning processes – another debate in the dopamine field (Cone et al., 2016; Deserno et al., 2021)– the data reflect a model-free process. This is further supported in the experiment involving multiple conditioning sessions, where dopamine ‘dips’ are observed in trials 1-10 on Conditioning Day 3 and Extinction Day 1 when the new value of sucrose has been established. Finally, the relatively slow updating of the value of sucrose is reflected in older literature using a continuous intraoral infusion. Using this approach, rats began rejecting the saccharin infusion only after ~2min rather than immediately (Schafe et al., 1998; Schafe and Bernstein, 1996; Wilkins and Bernstein, 2006).   

      Author response image 1.

      Author response image 2.

      Author response image 3.

      (4) Would this delayed below-baseline dip be visible with a shorter infusion time?

      While our experiments did not explore this parameter, it would be interesting to parametrically vary infusion duration times and examine differences in dopamine responses. However, we believe the most parsimonious explanation is that the ‘dip’ in VTA cell body activity develops as a function of the slow updating of the value of sucrose reflective of a model-free process. We recognize that this is mere speculation.

      (5) Does the max of the increase or the dip of the decrease better correlate with the behavioral measures of aversion (orofacial, paw movements) or sucrose preference than "mean z-score" measure used here?

      It seems plausible that finding the most extreme value from baseline could better correlate to behavioral measures. Time courses to max increase and max decrease are different. Moreover, with appetitive sucrose, there are often multiple transients that occur throughout a single intraoral infusion. Coupled with a noisy time course for individual components of behavioral reactivity, we determined that averaging data across the whole infusion period (i.e. mean z-score) was the most objective way we could analyze the dopamine and behavioral responses to taste stimuli.

      (6) The authors argue strongly in the discussion against the idea that dopamine is encoding "salience." Could this initial peak (also seen in the first few trials of quinine delivery, fig 1c color plot) be a "salience" response?

      Our response above to the potential for ‘mixed’ dopamine responses to aversive sucrose led to additional analyses that support a slow updating of both behavior and dopamine to the new, aversive value of sucrose. Quinine is innately aversive and thus the Reviewer rightly points out that even here we observe an increase in dopamine release evoked by quinine on the first few trials (as observed in the heat map). We’d like to note, though, that the order of stimulus exposure was counterbalanced across rats. In those rats first receiving a sucrose session, quinine initially caused a modest increase in dopamine release during the first 10 trials (which is more pronounced in the first 2 trials). In the subsequent 2 blocks of 10 trials, no such increase was observed. Interestingly, in rats for which quinine was their first stimulus, we did not see an increase in dopamine release on the first few trials (see Author response image 4). We speculate that the initial sucrose session required the value of intraoral infusions to be updated when quinine was delivered to these rats and that, once more, the updating process may be slow and akin to a model-free process. This analysis, at present, is underpowered but will direct future attention in follow-up work.

      Author response image 4.

      (7) Related to this, the color plots showing individual trials show a reduction in the increases to positive valence sucrose across conditioning day trials and a flip from infusion-onset increase to delayed increases across test day trials. This evolution across days makes it appear that the last few conditioning day trials would be impossible to discriminate from the first few test day trials in the CTA-paired. Presumably, from strength of CTA as a paradigm, the sucrose is already aversive to the animals at the first trial of test day. Why do the authors think the response evolves across this session?

      As the Reviewer noted, Points 3-7 are related. We have speculated that the evolving dopamine response in Paired rats across test day trials reflects a model-free process. Importantly, as in the manuscript, our additional analyses once again show a tight relationship between behavioral reactivity and the dopamine response across the test session trials. It is important to note, though, that these experiments were not designed to test if responses reflect model-free or model-based processes.

      (8) Given that most of the work is using a conditioned aversive stimulus, the comparison to a primary aversive tastant quinine is useful. However, the authors saw basically no dopamine response to a primary aversive tastant quinine (measured only with GRAB-DA) and saw less noticeable decreases following CTA for NAc recordings with GRAB-DA2h than with cell body GCaMP. Given that they are using the high-affinity version of the GRAB sensor, this calls into question whether this is a true difference in release vs. soma activity or issue of high affinity release sensor making decreases in dopamine levels more difficult to observe.

      We share the same speculation as the Reviewer. Using fast-scan cyclic voltammetry, albeit measuring dopamine concentration in the dorsomedial shell, we observed a clear decrease from baseline with intraoral infusions of quinine (Roitman et al., 2008). Using fiber photometry here, the Reviewer and we note that GRAB_DA2h is a high-affinity (i.e., EC50: 7nM) dopamine sensor with relatively long off-kinetics (i.e., t1/2 decay time: 7300ms) (Labouesse et al., 2020). It may therefore be much more difficult to observe decreases (below baseline) using this sensor. The publication of new dopamine sensors - with lower affinity, faster kinetics, and greater dynamic range (Zhuo et al., 2024) – introduces opportunities for comparison and the greater potential for capturing decreases below baseline. Due to the poorer kinetics associated with GRAB_DA2h, we would not assert that direct comparisons between the GCaMP- and GRAB-based signals observed here represent true differences between somatic and terminal activity.

      References

      Choi JY, Jang HJ, Ornelas S, Fleming WT, Fürth D, Au J, Bandi A, Engel EA, Witten IB. 2020. A Comparison of Dopaminergic and Cholinergic Populations Reveals Unique Contributions of VTA Dopamine Neurons to Short-Term Memory. Cell Rep 33. doi:10.1016/j.celrep.2020.108492

      Cone JJ, Fortin SM, McHenry JA, Stuber GD, McCutcheon JE, Roitman MF. 2016. Physiological state gates acquisition and expression of mesolimbic reward prediction signals. Proc Natl Acad Sci U S A 113. doi:10.1073/pnas.1519643113

      de Jong JW, Afjei SA, Pollak Dorocic I, Peck JR, Liu C, Kim CK, Tian L, Deisseroth K, Lammel S. 2019. A Neural Circuit Mechanism for Encoding Aversive Stimuli in the Mesolimbic Dopamine System. Neuron 101. doi:10.1016/j.neuron.2018.11.005

      Deserno L, Moran R, Michely J, Lee Y, Dayan P, Dolan RJ. 2021. Dopamine enhances model-free credit assignment through boosting of retrospective model-based inference. Elife 10. doi:10.7554/eLife.67778

      Hsu TM, Bazzino P, Hurh SJ, Konanur VR, Roitman JD, Roitman MF. 2020. Thirst recruits phasic dopamine signaling through subfornical organ neurons. Proc Natl Acad Sci U S A 117:30744–30754. doi:10.1073/PNAS.2009233117/-/DCSUPPLEMENTAL

      Jung K, Krüssel S, Yoo S, An M, Burke B, Schappaugh N, Choi Y, Gu Z, Blackshaw S, Costa RM, Kwon HB. 2024. Dopamine-mediated formation of a memory module in the nucleus accumbens for goal-directed navigation. Nat Neurosci. doi:10.1038/s41593-024-01770-9

      Labouesse MA, Cola RB, Patriarchi T. 2020. GPCR-based dopamine sensors—A detailed guide to inform sensor choice for in vivo imaging. Int J Mol Sci. doi:10.3390/ijms21218048

      Lammel S, Hetzel A, Häckel O, Jones I, Liss B, Roeper J. 2008. Unique Properties of Mesoprefrontal Neurons within a Dual Mesocorticolimbic Dopamine System. Neuron 57. doi:10.1016/j.neuron.2008.01.022

      McCutcheon JE, Ebner SR, Loriaux AL, Roitman MF, Tobler PN. 2012. Encoding of aversion by dopamine and the nucleus accumbens. Front Neurosci 6. doi:10.3389/fnins.2012.00137

      Morales I, Berridge KC. 2020. ‘Liking’ and ‘wanting’ in eating and food reward: Brain mechanisms and clinical implications. Physiol Behav. doi:10.1016/j.physbeh.2020.113152

      Roitman MF, Wheeler RA, Wightman RM, Carelli RM. 2008. Real-time chemical responses in the nucleus accumbens differentiate rewarding and aversive stimuli. Nature Neuroscience 2008 11:12 11:1376–1377. doi:10.1038/nn.2219

      Schafe GE, Bernstein IL. 1996. Forebrain contribution to the induction of a brainstem correlate of conditioned taste aversion: I. The amygdala. Brain Res 741. doi:10.1016/S0006-8993(96)00906-7

      Schafe GE, Thiele TE, Bernstein IL. 1998. Conditioning method dramatically alters the role of amygdala in taste aversion learning. Learning and Memory 5. doi:10.1101/lm.5.6.481

      Wilkins EE, Bernstein IL. 2006. Conditioning method determines patterns of c-fos expression following novel taste-illness pairing. Behavioural Brain Research 169. doi:10.1016/j.bbr.2005.12.006

      Yuan L, Dou YN, Sun YG. 2021. Topography of reward and aversion encoding in the mesolimbic dopaminergic system. Journal of Neuroscience 39. doi:10.1523/JNEUROSCI.0271-19.2019

      Zhuo Y, Luo B, Yi X, Dong H, Miao X, Wan J, Williams JT, Campbell MG, Cai R, Qian T, Li F, Weber SJ, Wang L, Li B, Wei Y, Li G, Wang H, Zheng Y, Zhao Y, Wolf ME, Zhu Y, Watabe-Uchida M, Li Y. 2024. Improved green and red GRAB sensors for monitoring dopaminergic activity in vivo. Nat Methods 21. doi:10.1038/s41592-023-02100-w

    1. Author Response:

      We greatly appreciate invaluable and constructive comments from Editors and Reviewers. We also thank for their time and patience. We are pleased for our manuscript to have been assessed valuable and solid.

      One of most critical concerns was a possible involvement of Ca2+ channel inactivation in the strong paired pulse depression (PPD). Meanwhile, we have already measured total (free plus buffered) calcium increments induced by each of first four APs in a 40 Hz train at axonal boutons of prelimbic layer 2/3 pyramidal cells. We found that first four Ca2+ increments were not different each other, arguing against possible contribution of Ca2+ channel inactivation to PPD. Please see our reply to the 2nd issue in the Weakness section of Reviewer #3.

      The second critical issue was on the definition of ‘vesicular probability’. Previously, vesicular probability (pv) has been used with reference to the releasable vesicle pool which includes not only tightly docked vesicles but also reluctant vesicles. On the other hand, the meaning of pv in the present study was release probability of tightly docked vesicles. We clarified this point in our replies to the 1st issues in the Weakness sections of Reviewer #2 and Reviewer #3.

      To other Reviews’ comments, we below described our point-by-point replies.

      Reviewer #2 (Public review):

      Summary:

      Shin et al aim to identify in a very extensive piece of work a mechanism that contributes to dynamic regulation of synaptic output in the rat cortex at the second time scale. This mechanism is related to a new powerful model is well versed to test if the pool of SV ready for fusion is dynamically scaled to adjust supply demand aspects. The methods applied are state-of-the-art and both address quantitative aspects with high signal to noise. In addition, the authors examine both excitatory output onto glutamatergic and GABAergic neurons, which provides important information on how general the observed signals are in neural networks, The results are compellingly clear and show that pool regulation may be predominantly responsible. Their results suggests that a regulation of release probability, the alternative contender for regulation, is unlikely to be involved in the observed short term plasticity behavior (but see below). Besides providing a clear analysis pof the underlying physiology, they test two molecular contenders for the observed mechanism by showing that loss of Synaptotagmin7 function and the role of the Ca dependent phospholipase activity seems critical for the short term plasticity behavior. The authors go on to test the in vivo role of the mechanism by modulating Syt7 function and examining working memory tasks as well as overall changes in network activity using immediate early gene activity. Finally, they model their data, providing strong support for their interpretation of TS pool occupancy regulation.

      Strengths:

      This is a very thorough study, addressing the research question from many different angles and the experimental execution is superb. The impact of the work is high, as it applies recent models of short term plasticity behavior to in vivo circuits further providing insights how synapses provide dynamic control to enable working memory related behavior through nonpermanent changes in synaptic output.

      Weaknesses:

      While this work is carefully examined and the results are presented and discussed in a detailed manner, the reviewer is still not fully convinced that regulation of release provability is not a putative contributor to the observed behavior. No additional work is needed but in the moment I am not convinced that changes in release probability are not in play. One solution may be to extend the discussion of changes in rules probability as an alternative.

      Quantal content (m) depends on n * pv, where n = RRP size and pv =vesicular release probability. The value for pv critically depends on the definition of RRP size. Recent studies revealed that docked vesicles have differential priming states: loosely or tightly docked state (LS or TS, respectively). Because the RRP size estimated by hypertonic solution or long presynaptic depolarization is larger than that by back extrapolation of a cumulative EPSC plot (Moulder & Mennerick, 2005; Sakaba, 2006) in glutamatergic synapses, the former RRP (denoted as RRPhyper) may encompass not only AP-evoked fast-releasing vesicles (TS vesicle) but also reluctant vesicles (LS vesicles). Because we measured pv based on AP-evoked EPSCs such as strong paired pulse depression (PPD) and associated failure rates, pv in the present study denotes vesicular fusion probability of TS vesicles not that of LS plus TS vesicles.

      Recent studies suggest that release sites are not fully occupied by TS vesicles in the baseline (Miki et al., 2016; Pulido and Marty, 2018; Malagon et al., 2020; Lin et al., 2022). Instead the occupancy (pocc) by TS vesicles is subject to dynamic regulation by reversible rate constants (denoted by k1 and b1, respectively). The number of TS vesicles (n) can be factored into the number of release sites (N) and pocc, among which N is a fixed parameter but pocc depends on k1/(k1+b1) under the framework of the simple refilling model (see Methods). Because these refilling rate constants are regulated by Ca2+ (Hosoi, et al., 2008), pocc is not a fixed parameter. Therefore, release probability should be re-defined as pocc x pv. In this regard, the increase in release probability is a major player in STF. Our study asserts that STF by 2.3 times can be attributed to an increase in pocc rather than pv, because pv is close to unity (Fig. S8). Moreover, strong PPD was observed not only in the baseline but also at the early and in the middle of a train (Fig. 2 and 7) and during the recovery phase (Fig. 3), arguing against a gradual increase in pv of reluctant vesicles.

      If the Reviewer meant vesicular release or fusion probability (pv) by ‘release provability’, pv (of TS vesicles) is not a major player in STF, because the baseline pv is already higher than 0.8 even if it is most parsimoniously estimated (Fig. 2). Moreover, considering very high refilling rate (23/s), the high double failure rate cannot be explained without assuming that pv is close to unity (Fig. S8).

      Conventional models for facilitation assume a post-AP residual Ca2+-dependent step increase in pv of RRP (Dittman et al., 2000) or reluctant vesicles (Turecek et al., 2016). Given that pv of TS vesicles is close to one, an increase in pv of TS vesicles cannot account for facilitation. The possibility for activity-dependent increase in fusion probability of LS vesicles (denoted as pv,LS) should be considered in two ways depending on whether LS and TS vesicles reside in distinct pools or in the same pool. Notably, strong PPD at short ISI implies that pv,LS is near zero at the resting state. Whereas LS vesicles do not contribute to baseline transmission, short-term facilitation (STF) may be mediated by cumulative increase in pv, LS that reside in a distinct pool. Because the increase in pv,LS during facilitation recruits new release sites (increase in N), the variance of EPSCs should become larger as stimulation frequency increases, resulting in upward deviation from a parabola in the V-M plane, as shown in recent studies (Valera et al., 2012; Kobbersmed et al., 2020). This prediction is not compatible with our results of V-M analysis (Fig. 3), showing that EPSCs during STF fell on the same parabola regardless of stimulation frequencies. Therefore, it is unlikely that an increase in fusion probability of reluctant vesicles residing in a distinct release pool mediates STF in the present study.

      For the latter case, in which LS and TS vesicles occupy in the same release sites, it is hard to distinguish a step increase in fusion probability of LS vesicles from a conversion of LS vesicles to TS. Nevertheless, our results do not support the possibility for gradual increase in pv,LS that occurs in parallel with STF. Strong PPD, indicative of high pv, was consistently found not only in the baseline (Fig. 2 and Fig. S6) but also during post-tetanic augmentation phase (Fig. 3D) and even during the early development of facilitation (Fig. 2D-E and Fig. 7), arguing against gradual increase in pv,LS. One may argue that STF may be mediated by a drastic step increase of pv,LS from zero to one, but it is not distinguishable from conversion of LS to TS vesicles.

      To address the reviewer’s concern, we will incorporate these perspectives into the discussion and further clarify the reasoning behind our conclusions.

      <References>

      Moulder KL, Mennerick S (2005) Reluctant vesicles contribute to the total readily releasable pool in glutamatergic hippocampal neurons. J Neurosci 25:3842–3850.

      Sakaba, T (2006) Roles of the fast-releasing and the slowly releasing vesicles in synaptic transmission at the calyx of Held. J Neurosci 26(22): 5863-5871.

      Fig 3 I am confused about the interpretation of the Mean Variance analysis outcome. Since the data points follow the curve during induction of short term plasticity, aren't these suggesting that release probability and not the pool size increases? Related, to measure the absolute release probability and failure rate using the optogenetic stimulation technique is not trivial as the experimental paradigm bias the experiment to a given output strength, and therefore a change in release probability cannot be excluded.

      Under the recent definition of release probability, it can be factored into pv and pocc, which are fusion probability of TS vesicles and the occupancy of release sites by TS vesicles, respectively. With this regard, our interpretation of the Variance-Mean results is consistent with conventional one: different data points along a parabola represent a change in release probability (= pocc x pv). Our novel finding is that the increase in release probability should be attributed to an increase in pocc, not to that in pv.

      Fig4B interprets the phorbol ester stimulation to be the result of pool overfilling, however, phorbol ester stimulation has also been shown to increase release probability without changing the size of the readily releasable pool. The high frequency of stimulation may occlude an increased paired pulse depression in presence of OAG, which others have interpreted in mammalian synapses as an increase in release probability.

      To our experience in the calyx of Held synapses, OAG, a DAG analogue, increased the fast releasing vesicle pool (FRP) size (Lee JS et al., 2013), consistent with our interpretation (pool overfilling). Once the release sites are overfilled in the presence of OAG, it is expected that the maximal STF (ratio of facilitated to baseline EPSCs) becomes lower as long as the number of release sites (N) are limited. As aforementioned, the baseline pv is already close to one, and thus it cannot be further increased by OAG. Instead, the baseline pocc seems to be increased by OAG.

      <Reference>

      Lee JS, et al., Superpriming of synaptic vesicles after their recruitment to the readily releasable pool. Proc Natl Acad Sci U S A, 2013. 110(37): 15079-84.

      The literature on Syt7 function is still quite controversial. An observation in the literature that loss of Syt7 function in the fly synapse leads to an increase of release probability. Thus the observed changes in short term plasticity characteristics in the Syt7 KD experiments may contain a release probability component. Can the authors really exclude this possibility? Figure 5 shows for the Syt7 KD group a very prominent depression of the EPSC/IPSC with the second stimulus, particularly for the short interpulse intervals, usually a strong sign of increased release probability, as lack of pool refilling can unlikely explain the strong drop in synaptic output.

      The reviewer raises an interesting point regarding the potential link between Syt7 KD and increased initial pv, particularly in light of observations in Drosophila synapses (Guan et al., 2020; Fujii et al., 2021), in which Syt7 mutants exhibited elevated initial pv. However, it is important to note that these findings markedly differ from those in mammalian systems, where the role of Syt7 in regulating initial pv has been extensively studied. In rodents, consistent evidence indicates that Syt7 does not significantly affect initial pv, as demonstrated in several studies (Jackman et al., 2016; Chen et al., 2017; Turecek and Regehr, 2018). Furthermore, in our study of excitatory synapses in the mPFC layer 2/3, we observed an initial pv already near its maximal level, approaching a value of 1. Consequently, it is unlikely that the loss of Syt7 could further elevate the initial pv. Instead, such effects are more plausibly explained by alternative mechanisms, such as alterations in vesicle replenishment dynamics, rather than a direct influence on pv.

      <References>

      Chen, C., et al., Triple Function of Synaptotagmin 7 Ensures Efficiency of High-Frequency Transmission at Central GABAergic Synapses. Cell Rep, 2017. 21(8): 2082-2089.

      Fujii, T., et al., Synaptotagmin 7 switches short-term synaptic plasticity from depression to facilitation by suppressing synaptic transmission. Scientific reports, 2021. 11(1): 4059.

      Guan, Z., et al., Drosophila Synaptotagmin 7 negatively regulates synaptic vesicle release and replenishment in a dosage-dependent manner. Elife, 2020. 9: e55443.

      Jackman, S.L., et al., The calcium sensor synaptotagmin 7 is required for synaptic facilitation. Nature, 2016. 529(7584): 88-91.

      Turecek, J. and W.G. Regehr, Synaptotagmin 7 mediates both facilitation and asynchronous release at granule cell synapses. Journal of Neuroscience, 2018. 38(13): 3240-3251.

      Reviewer #3 (Public review):

      Summary:

      The report by Shin, Lee, Kim, and Lee entitled "Progressive overfilling of readily releasable pool underlies short-term facilitation at recurrent excitatory synapses in layer 2/3 of the rat prefrontal cortex" describes electrophysiological experiments of short-term synaptic plasticity during repetitive presynaptic stimulation at synapses between layer 2/3 pyramidal neurons and nearby target neurons. Manipulations include pharmacological inhibition of PLC and actin polymerization, activation of DAG receptors, and shRNA knockdown of Syt7. The results are interpreted as support for the hypothesis that synaptic vesicle release sites are vacant most of the time at resting synapses (i.e., p_occ is low) and that facilitation (and augmentation) components of short-term enhancement are caused by an increase in occupancy, presumably because of acceleration of the transition from not-occupied to occupied. The report additionally describes behavioural experiments where trace fear conditioning is degraded by knocking down syt7 in the same synapses.

      Strengths:

      The strength of the study is in the new information about short-term plasticity at local synapses in layer 2/3, and the major disruption of a memory task after eliminating short-term enhancement at only 15% of excitatory synapses in a single layer of a small brain region. The local synapses in layer 2/3 were previously difficult to study, but the authors have overcome a number of challenges by combining channel rhodopsins with in vitro electroporation, which is an impressive technical advance.

      Weaknesses:

      The question of whether or not short-term enhancement causes an increase in p_occ (i.e., "readily releasable pool overfilling") is important because it cuts to the heart of the ongoing debate about how to model short term synaptic plasticity in general. However, my opinion is that, in their current form, the results do not constitute strong support for an increase in p_occ, even though this is presented as the main conclusion. Instead, there are at least two alternative explanations for the results that both seem more likely. Neither alternative is acknowledged in the present version of the report.

      The evidence presented to support overfilling is essentially two-fold. The first is strong paired pulse depression of synaptic strength when the interval between action potentials is 20 or 25 ms, but not when the interval is 50 ms. Subsequent stimuli at frequencies between 5 and 40 Hz then drive enhancement. The second is the observation that a slow component of recovery from depression after trains of action potentials is unveiled after eliminating enhancement by knocking down syt7. Of the two, the second is predicted by essentially all models where enhancement mechanisms operate independently of release site depletion - i.e., transient increases in p_occ, p_v, or even N - so isn't the sort of support that would distinguish the hypothesis from alternatives (Garcia-Perez and Wesseling, 2008, https://doi.org/10.1152/jn.01348.2007).

      The apparent discrepancy in interpretation of post-tetanic augmentation between the present and previous papers [Sevens Wesseling (1999), Garcia-Perez and Wesseling (2008)] is an important issue that should be clarified. We noted that different meanings of ‘vesicular release probability’ in these papers are responsible for the discrepancy. We will add an explanation to Discussion on the difference in the meaning of ‘vesicular release probability’ between the present study and previous studies [Sevens Wesseling (1999), Garcia-Perez and Wesseling (2008)]. In summary, the pv in the present study was used for vesicular release probability of TS vesicles, while previous studies used it as vesicular release probability of vesicles in the RRP, which include LS and TS vesicles. Accordingly, pocc in the present study is occupancy of release sites by TS vesicles.

      Not only double failure rate but also other failure rates upon paired pulse stimulation were best fitted at pv close to 1 (Fig. S8 and associated text). Moreover, strong PPD, indicating release of vesicles with high pv, was observed not only at the beginning of a train but also in the middle of a 5 Hz train (Fig. 2D), during the augmentation phase after a 40 Hz train (Fig 3D), and in the recovery phase after three pulse bursts (Fig. 7). Given that pv is close to 1 throughout the EPSC trains and that N does not increase during a train (Fig. 3), synaptic facilitation can be attained only by the increase in pocc (occupancy of release sites by TS vesicles). In addition, it should be noted that Fig. 7 demonstrates strong PPD during the recovery phase after depletion of TS vesicles by three pulse bursts, indicating that recovered vesicles after depletion display high pv too. Knock-down of Syt7 slowed the recovery of TS vesicles after depletion of TS vesicles, highlighting that Syt7 accelerates the recovery of TS vesicles following their depletion.

      As addressed in our reply to the first issue raised by Reviewer #2 and the third issue raised by Reviewer #3, our results do not support possibilities for recruitment of new release sites (increase in N) having low pv or for a gradual increase in pv of reluctant vesicles during short-term facilitation.  

      <Following statement will be added to _Discussion_ in the revised manuscript>

      Previous studies suggested that an increase in pv is responsible for post-tetanic augmentation (Stevens and Wesseling, 1999; Garcia-Perez and Wesseling, 2008) by observing invariance of the RRP size after tetanic stimulation. In these studies, the RRP size was estimated by hypertonic sucrose solution or as the sum of EPSCs evoked 20 Hz/60 pulses train (denoted as ‘RRPhyper’). Because reluctant vesicles (called LS vesicles) can be quickly converted to TS vesicles (16/s) and are released during a train (Lee et al., 2012), it is likely that the RRP size measured by these methods encompasses both LS and TS vesicles. In contrast, we assert high pv based on the observation of strong PPD and failure rates upon paired stimulations at ISI of 20 ms (Fig. 2 and Fig. S8). Given that single AP-induced vesicular release occurs from TS vesicles but not from LS vesicles, pv in the present study indicates the fusion probability of TS vesicles. From the same reasons, pocc denotes the occupancy of release sites by TS vesicles. Note that our study does not provide direct clue whether release sites are occupied by LS vesicles that are not tapped by a single AP, although an increase in the LS vesicle number may accelerate the recovery of TS vesicles. As suggested in Neher (2024), even if the number of LS plus TS vesicles are kept constant, an increase in pocc (occupancy by TS vesicles) would be interpreted as an increase in ‘vesicular release probability’ as in the previous studies (Stevens and Wesseling (1999); Garcia-Perez and Wesseling (2008)) as long as it was measured based on RRPhyper.

      Regarding the paired pulse depression: The authors ascribe this to depletion of a homogeneous population of release sites, all with similar p_v. However, the details fit better with the alternative hypothesis that the depression is instead caused by quickly reversing inactivation of Ca2+ channels near release sites, as proposed by Dobrunz and Stevens to explain a similar phenomenon at a different type of synapse (1997, PNAS,<br /> https://doi.org/10.1073/pnas.94.26.14843). The details that fit better with Ca2+ channel inactivation include the combination of the sigmoid time course of the recovery from depression (plotted backwards in Fig1G,I) and observations that EGTA (Fig2B) increases the paired-pulse depression seen after 25 ms intervals. That is, the authors ascribe the sigmoid recovery to a delay in the activation of the facilitation mechanism, but the increased paired pulse depression after loading EGTA indicates, instead, that the facilitation mechanism has already caused p_r to double within the first 25 ms (relative to the value if the facilitation mechanism was not active). Meanwhile, Ca2+ channel inactivation would be expected to cause a sigmoidal recovery of synaptic strength because of the sigmoidal relationship between Ca2+-influx and exocytosis (Dodge and Rahamimoff, 1967, https://doi.org/10.1113/jphysiol.1967.sp008367).

      The Ca2+-channel inactivation hypothesis could probably be ruled in or out with experiments analogous to the 1997 Dobrunz study, except after lowering extracellular Ca2+ to the point where synaptic transmission failures are frequent. However, a possible complication might be a large increase in facilitation in low Ca2+ (Fig2B of Stevens and Wesseling, 1999, https://doi.org/10.1016/s0896-6273(00)80685-6).

      We appreciate the reviewer's thoughtful comment regarding the potential role of Ca2+ channel inactivation in the observed paired-pulse depression (PPD). As noted by the Reviewer, the Dobrunz and Stevens (1997) suggested that the high double failure rate at short ISIs in synapses exhibiting PPD can be attributed to Ca2+ channel inactivation. This interpretation seems to be based on a premise that the number of RRP vesicles are not varied trial-by-trial. The number of TS vesicles, however, can be dynamically regulated depending on the parameters k1 and b1, as shown in Fig. S8, implying that the high double failure rate at short ISIs cannot be solely attributed to Ca2+ channel inactivation. Nevertheless, we acknowledge the possibility that Ca2+ channel inactivation may contribute to PPD, and therefore, we have further investigated this possibility. Specifically, we measured action potential (AP)-evoked Ca2+ transients at individual axonal boutons of layer 2/3 pyramidal cells in the mPFC using two-dye ratiometry techniques. Our analysis revealed no evidence for Ca2+ channel inactivation during a 40 Hz train of APs. This finding indicates that voltage-gated Ca2+ channel inactivation is unlikely to contribute to the pronounced PPD.

      Author response image 1 below shows how we measured the total Ca2+ increments at axonal boutons. First we estimated endogenous Ca2+-binding ratio from analyses of single AP-induced Ca2+ transients at different concentrations of Ca2+ indicator dye (panels A to E). And then, using the Ca2+ buffer properties, we converted free [Ca2+] amplitudes to total calcium increments for the first four AP-evoked Ca2+ transients in a 40 Hz train (panels G-I). We will incorporate these results into the revised version of reviewed preprint to provide evidence against the Ca2+ channel inactivation.

      Author response image 1.

      On the other hand, even if the paired pulse depression is caused by depletion of release sites rather than Ca2+-channel inactivation, there does not seem to be any support for the critical assumption that all of the release sites have similar p_v. And indeed, there seems to be substantial emerging evidence from other studies for multiple types of release sites with 5 to 20-fold differences in p_v at a wide variety of synapse types (Maschi and Klyachko, eLife, 2020, https://doi.org/10.7554/elife.55210; Rodriguez Gotor et al, eLife, 2024, https://doi.org/10.7554/elife.88212 and refs. therein). If so, the paired pulse depression could be caused by depletion of release sites with high p_v, whereas the facilitation could occur at sites with much lower p_v that are still occupied. It might be possible to address this by eliminating assumptions about the distribution of p_v across release sites from the variance-mean analysis, but this seems difficult; simply showing how a few selected distributions wouldn't work - such as in standard multiple probability fluctuation analyses - wouldn't add much.

      We appreciate the reviewer’s insightful comments regarding the potential increase in pfusion of reluctant vesicles. It should be noted, however, that Maschi and Klyachko (2020) showed a distribution of release probability (pr) within a single active zone rather than a heterogeneity in pfusion of individual docked vesicles. Therefore both pocc and pv of TS vesicles would contribute to the pr distribution shown in Maschi and Klyachko (2020). 

      The Reviewer’s concern aligns closely with the first issue raised by Reviewer #2, to which we addressed in detail. Briefly, new release site may not be recruited during facilitation or post-tetanic augmentation, because variance of EPSCs during and after a train fell on the same parabola (Fig. 3). Secondly, strong PPD was observed not only in the baseline but also during early and late phases of facilitation, indicating that vesicles with very high pv contribute to EPSC throughout train stimulations (Fig. 2, 3, and 7). These findings argue against the possibilities for recruitment of new release sites harboring low pv vesicles and for a gradual increase in fusion probability of reluctant vesicles.

      To address the reviewers’ concern, we will incorporate the perspectives into Discussion and further clarify the reasoning behind our conclusions.

      In any case, the large increase - often 10-fold or more - in enhancement seen after lowering Ca2+ below 0.25 mM at a broad range of synapses and neuro-muscular junctions noted above is a potent reason to be cautious about the LS/TS model. There is morphological evidence that the transitions from a loose to tight docking state (LS to TS) occur, and even that the timing is accelerated by activity. However, 10-fold enhancement would imply that at least 90 % of vesicles start off in the LS state, and this has not been reported. In addition, my understanding is that the reverse transition (TS to LS) is thought to occur within 10s of ms of the action potential, which is 10-fold too fast to account for the reversal of facilitation seen at the same synapses (Kusick et al, 2020, https://doi.org/10.1038/s41593-020-00716-1).

      As the reviewer suggested, low external Ca2+ concentration can lower release probability (pr). Given that both pv and pocc are regulated by [Ca2+]i, low external [Ca2+] may affect not only pv but also pocc, both of which would contribute to low pr. Under such conditions, it would be plausible that the baseline pr becomes much lower than 0.1 due to low pv and pocc (for instance, pv decreases from 1 to 0.5, and pocc from 0.3 to 0.1, then pr = 0.05), and then pr (= pv x pocc) has a room for an increase by a factor of ten (0.5, for example) by short-term facilitation as cytosolic [Ca2+] accumulates during a train.

      If pv is close to one, pr depends pocc, and thus facilitation depends on the number of TS vesicles just before arrival of each AP of a train. Thus, post-train recovery from facilitation would depend on restoration of equilibrium between TS and LS vesicles to the baseline. Even if transition between LS and TS vesicles is very fast (tens of ms), the equilibrium involved in de novo priming (reversible transitions between recycling vesicle pool and partially docked LS vesicles) seems to be much slower (13 s in Fig. 5A of Wu and Borst 1999). Thus, we can consider a two-step priming model (recycling pool -> LS -> TS), which is comprised of a slow 1st step (-> LS) and a fast 2nd step (-> TS). Under the framework of the two-step model, the slow 1st step (de novo priming step) is the rate limiting step regulating the development and recovery kinetics of facilitation. Given that on and off rate for Ca2+ binding to Syt7 is slow, it is plausible that Syt7 may contribute to short-term facilitation (STF) by Ca2+-dependent acceleration of the 1st step (as shown in Fig. 9). During train stimulation, the number of LS vesicles would slowly accumulate in a Syt7 and Ca2+-dependent manner, and this increase in LS vesicles would shift LS/TS equilibrium towards TS, resulting in STF. After tetanic stimulation, the recovery kinetics from facilitation would be limited by slow recovery of LS vesicles.

      <Reference>

      Wu, L.-G. and Borst J.G.G. (1999) The reduced release probability of releasable vesicles during recovery from short-term synaptic depression. Neuron, 23(4): 821-832.

      Individual points:

      (1) An additional problem with the overfilling hypothesis is that syt7 knockdown increases the estimate of p_occ extracted from the variance-mean analysis, which would imply a faster transition from unoccupied to occupied, and would consequently predict faster recovery from depression. However, recovery from depression seen in experiments was slower, not faster. Meanwhile, the apparent decrease in the estimate of N extracted from the mean-variance analysis is not anticipated by the authors' model, but fits well with alternatives where p_v varies extensively among release sites because release sites with low p_v would essentially be silent in the absence of facilitation.

      Slower recovery from depression observed in the Syt7 knockdown (KD) synapses (Fig. 7) may results from a deficiency in activity-dependent acceleration of TS vesicle recovery. Although basal occupancy was higher in the Syt7 KD synapses, this does not indicate a faster activity-dependent recovery.

      Higher baseline occupancy does not always imply faster recovery of PPR too. Actually PPR recovery was slower in Syt7 KD synapses than WT one (18.5 vs. 23/s). Under the framework of the simple refilling model (Fig. S8Aa), the baseline occupancy and PPR recovery rate are calculated as k1 / (k1 + b1) and (k1 + b1), respectively. The baseline occupancy depends on k1/b1, while the PPR recovery on absolute values of k1 and b1. Based on pocc and PPR recovery time constant of WT and KD synapses, we expect higher k1/b1 but lower values for (k1 +b1) in Syt7 KD synapses compared to WT ones.

      Lower release sites (N) in Syt7-KD synapses was not anticipated. As you suggested, such low N might be ascribed to little recruitment of release sites during a train in KD synapses. But our results do not support this model. If silent release sites are recruited during a train, the variance should upwardly deviate from the parabola predicted under a fixed N (Valera et al., 2012; Kobbersmed et al. 2020). Our result was not the case (Fig. 3). In the first version of Ms, we have argued against this possibility in line 203-208.

      As discussed in both the Results and Discussion sections, the baseline EPSC was unchanged by KD (Fig. S3) because of complementary changes in the number of docking sites and their baseline occupancy (Fig. 6). These findings suggest that Syt7 may be involved in maintaining additional vacant docking sites, which could be overfilled during facilitation. It remains to be determined whether the decrease in docking sites in Syt7 KD synapses is related to its specific localization of Syt7 at the plasma membrane of active zones, as proposed in previous studies (Sugita et al., 2001; Vevea et al., 2021).

      (2) Figure S4A: I like the TTX part of this control, but the 4-AP part needs a positive control to be meaningful (e.g., absence of TTX).

      The reason why we used 4-AP in the presence of TTX was to increase the length constant of axon fibers and to facilitate the conduction of local depolarization in the illumination area to axon terminals. The lack of EPSC in the presence of 4-AP and TTX indicates that illumination area is distant from axon terminals enough for optic stimulation-induced local depolarization not to evoke synaptic transmission. This methodology has been employed in previous studies including the work of Little and Carter (2013).

      <Reference>

      Little JP and Carter AG (2013) Synaptic mechanisms underlying strong reciprocal connectivity between the medial prefrontal cortex and basolateral amygdala. J Neurosci, 33(39): 15333-15342.

      (3) Line 251: At least some of the previous studies that concluded these drugs affect vesicle dynamics used logic that was based on some of the same assumptions that are problematic for the present study, so the reasoning is a bit circular.

      (4) Line 329 and Line 461: A similar problem with circularity for interpreting earlier syt7 studies.

      (Reply to #3 and #4) We selected the target molecules as candidates based on their well-characterized roles in vesicle dynamics, and aimed to investigate what aspects of STP are affected by these molecules in our experimental context. For example, we could find that the baseline pocc and short-term facilitation (STF) are enhanced by the baseline DAG level and train stimulation-induced PLC activation, respectively. Notably, the effect of dynasore informed us that slow site clearing is responsible for the late depression of 40 Hz train EPSC. The knock-down experiments also provided us with information on the critical role of Syt7 in replenishment of TS vesicles. These approaches do not deviate from standard scientific reasoning but rather builds upon prior knowledge to formulate and test hypotheses.

      Importantly, our conclusions do not rely solely on the assumption that altering the target molecule impacts synaptic transmission. Instead, our conclusions are derived from a comprehensive analysis of diverse outcomes obtained through both pharmacological and genetic manipulations. These interpretations align closely with prior literature, further validating our conclusions.

      Therefore, the use of established studies to guide candidate selection and the consistency of our findings with existing knowledge do not represent a logical circularity but rather a reinforcement of the proposed mechanism through converging lines of evidence.

    1. Author response:

      We were pleased to read the positive comments regarding our manuscript and thank the reviewers and editors for the constructive feedback which we believe will be very helpful to improve the current version of the manuscript.

      Prior to addressing all comments in a full response, we provide a response to three issues that were raised in this provisional plan for revision: validation of the tracking algorithm, biological replicates, and mosquito survival.

      (1) Validation of the tracking algorithm:

      Reviewer 2 mentions that there is "No external validation for the flight tracking algorithm using manual annotation". We will address this comment in our full response by creating a manually labelled dataset to validate our detection algorithm.

      However, we would like to point out two important points:

      i) Quantifying the accuracy of a detection algorithm using a manually annotated set is indeed common practice in deep/machine learning algorithms in which manually annotated data are used to train the algorithm, and another set of manually annotated data is used to validate it. However, our detection and tracking algorithm is based on conventional computer vision techniques (not using any deep learning) that have been in use for several decades. Given that these algorithms are completely transparent and deterministic (as opposed to deep learning algorithms that are difficult to dissect and are created using partly stochastic processes) it is not common practice to use human annotations for validation. However, to address Reviewer 2's comment we will provide validation metrics in our full response.

      ii) We furthermore would like to note that our main metrics of interest (e.g. fraction of mosquitoes flying) only depends on accurately detecting mosquitoes and quantifying movement, its accuracy is not affected by potential identity swaps (the typical bottleneck in tracking algorithms).

      (2) Replicates:

      Reviewer 3 states that "Most experiments are only done with single replicates". This statement is not accurate: In Figure 2 we used 3 independent biological replicates for 4 colonies, 2 of which are Aaa and 2 are Aaf. We indeed provide additional data for 6 more colonies using a single replicate. Combined this data set comprises 588 days of continuous recordings. For Figures 3 and 4 we have 2 replicates for each perturbation experiment. For Figure 5 we provided 3 replicates for the host-seeking experiments. As outlined, the vast majority of our experiments has multiple replicates. We realize this may not have been described clearly enough in the manuscript, we will clarify this in the revised manuscript.

      (3) Mosquito survival:

      Below we provide survival data for the data shown in Figures 1 - 4, we will include this data as supplementary material. Overall we note here that mortality for all experiments was similar to the 'baseline' mortality we observe in our standard colony maintenance procedures. After three weeks, we typically observed that 70% of mosquitoes were still alive.

      Author response image 1.

      Survival curves for the data presented in Figures 1 - 4 of the main text. Day 0 indicates the day on which the BuzzWatch experiment started

    1. Author response:

      Reply to Reviewer #1 (Public Review):

      The post-processing increases number of putative neoantigens. As shown in Author response image 1, this is done through data augmentation or “mutations” of individual amino acids in a sequence by their most similar amino acid in the BLOSUM62 embedding. If most of the mutations result in a positive prediction (which we binarize through a >0.5 score) the sequence changes its prediction.

      Author response image 1.

      Post-processing pipeline to increase the number of putative neoantigens. Sequences can either be predicted using the forward method, for which a raw score is produced, or it can be introduced to a majority-vote prediction of the ensemble prediction of similar protein sequences.

      In this article, we obtain the following candidates after post-processing.

      Author response table 1.

      As mentioned, the prediction column shows a binary label. The full list contained 402 sequences did not include any other sequences that met the majority vote criteria.

      As noted by the reviewer, the Table 3 of our original paper includes the scores of the direct prediction, which has four sequences in common with the post-processing criteria (*Pnp, *Adar, *Lrrc28 and *Nr1h2). * indicates the mutated form of the peptide, i.e neoantigen.

      We selected the top 4 predicted antigens (present both by direct prediction and after post-processing; (*Pnp, *Adar, *Lrrc28 and *Nr1h2) (Wert-Carvajal et al. 2021), but we encountered difficulty in synthesizing, *Nr1h2 (Mutated Nr1h2), and thus it could not be included in the study.

      We also decided to evaluate the immunogenicity of *Wiz, which was identified as a potential TNA only after postprocessing. *Wiz exhibited lower levels of immunogenicity compared to *Pnp, *Adar, and *Lrrc28. However, unlike these, *Wiz is highly expressed in the tumor, and vaccination with *Wiz provided the strongest protection levels. These findings led us to incorporate post-processingg into the NAP-CNB platform.

      We chose *Herc6 as a mutated antigen predicted not to be a TNA over other candidates because its expression in the tumor was similar to that of *Wiz.

      Depending on the experiment we used 4 or 5 animals per group (this will be clarify in the revised version)

      The software used for statistical analysis was GraphPad Prism.

      Reply to Reviewer #2 (Public Review):

      This is true, binding affinity does not always predict immune responses but in most cases, high affinity peptides are immunogenic. There are of course other parameters that drive the effective priming of tumor-reactive CD8+ T cells through antigen cross-presentation, but the mechanisms of antigen presentation are yet not completely understood. High affinity peptides are desirable as good candidates in neoantigen-based vaccines.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors report a new bioinformatics pipeline ("SPICE") to predict pairwise cooperative binding-sites based on input ChIP-seq data for transcription factor (TF)-of-interest, analyzed against DNA-binding sites (DNA motifs) in a database (HOCOMOCO). The pipeline also predicts the optimal distance between the paired binding sites. The pipeline correctly predicted known/reported transcription factor cooperations, and also predicted new cooperations, not yet reported in literature. The authors choose to follow up on the predicted interaction between Ikaros and Jun. Using ChIP-seq in mouse B cells, they show extensive overlap in binding regions between Ikaros and Jun in LPS+IL21 stimulated cells. In a human B-lineage cell line (MINO) they show that anti-Ikaros Ab can co-immunoprecipitate Jun protein, and that the MINO cell extracts contain protein(s) that can bind to the CNS9 probe (conserved region upstream of IL10 gene), and that binding is lost upon mutation of two basepairs in the AP1 binding motif, and reduced upon mutation of two basepairs in the non-canonical Ikaros binding motif. Part of this protein complex is super-shifted with an anti-Jun antibody, and more DNA is shifted with addition of an anti-Ikaros antibody.

      The authors perform EMSA showing that recombinant Jun can bind to the tested DNA-region (IL10 CNS9) and that addition of recombinant Ikaros (or anti-Ikaros antibody in Fig 3E) can enhance binding (increase amount of DNA shifted). The authors lastly show that the IL10 CNS9 DNA region can enhance transcription in B- and T-cells with a luciferase reporter assay, and that 2 bp mutation of the Ikaros or Jun DNA motifs greatly reduce or abolish this activity.

      This is interesting work, with two main contributions: The SPICE pipeline (if made available to the scientific community), and the report of interaction between Ikaros and Jun. However, the distinction between DNA motifs, and the proteins actually binding and having a biological function, should be made clear consistently throughout the manuscript. The same DNA motifs can be bound by multiple factors, for instance within transcription factor families with highly homology in the DNA-binding regions of the proteins.

      The reviewer has correctly assessed the content of our manuscript.

      Some specific points:

      SPICE: It is unclear if this is uploaded somewhere to be available to the scientific community.

      Thanks for this comment. We will upload the SPICE pipeline and its associated scripts (R and shell) via GitHub.

      It was unclear if Ikaros-Jun interaction was initially found from primary Jun ChIP-seq (and secondary Ikaros motif from HOCOMOCO) or from primary Ikaros CHIP-seq (and secondary Jun motif from HOCOMOCO). And - what were the two DNA motifs (primary and secondary, and their distance) from the SPICE analysis?

      The IKZF1-JUN interaction was found from primary JUN ChIP-seq data and searching for secondary IKZF1 motifs identified in the HOCOMOCO database. We will provide the primary and secondary motifs in our revised manuscript.

      Authors have mostly careful considerations and statements. One additional comment is that binding does not equal function (Fig 2D), and that opening of chromatin (by any other factor(s)) can give DNA-binding factors (like Ikaros and Jun) the opportunity to bind, without functional consequence for the biological process studied.

      We appreciate that the reviewer believes our considerations and statement are careful. We agree that opening of chromatin can give the opportunity of factors to bind, and we now make this point in the manuscript.

      Figure 2E: Ikaros is reported to be expressed at baseline in murine B cells, yet the Ikaros ChIP-seq in unstimulated cells had what looks to be no significant or low peaks. LPS stimulation induced strong Ikaros ChIP-seq signal. A western blot showing the Ikaros protein levels in the 3 conditions could help understand if the binding pattern is due to protein expression level induction. Similar for Jun (western in the 3 conditions), which seemed to mainly bind in the LPS+IL21 condition. Furthermore, as also suggested below, tracks showing Ikaros and Jun binding from all conditions (unstimulated, LPS only and LPS+IL21 stimulated cells), at select genomic loci, would be helpful in illustrating this difference in signal between the different cell conditions. This is relevant in regards to the point of cooperativity of binding.

      The main point of the paper was showing functional cooperation and proximity of binding. However, the use of purified JUN and Ikaros protein suggest cooperative binding. Exhaustive evaluation of the JUN-Ikaros association is left for future studies.

      ChIP-seq in mouse B cells showed that Ikaros bound strongly in LPS stimulated cells, in the (relative) absence of Jun binding (Fig. 2C). However, in EMSA (Fig 3C), there is no binding when the AP1 site is mutated, and the authors describe this as Ikaros binding site. What does the Ikaros binding look like at this genomic location in LPS (only) stimulated cells? The authors could show the same figure as in Fig 2F but show Ikaros and Jun ChIP-seq tracks at IL10 CNS9 locus from all conditions to compare binding in unstimulated, LPS and LPS+IL21 cells.

      As requested, we now show Ikaros and Jun ChIP-seq tracks from unstimulated, LPS-treated, and LPS + IL21-treated cells. Both Ikaros and cJUN were bound to the Il10 upstream CNS9 region with LPS treatment of cells (see Author response image 1, highlighted in red box), but binding was weaker than that observed with LPS + IL21.

      Author response image 1.

      Also: How does this reconcile with the luciferase assay in Fig 4E, where LPS (only) stimulation is used, which in Fig 2E only/mainly induced Ikaros, and not Jun ChIP-seq signal (while EMSA indicate Ikaros cannot bind the site alone, but can enhance Jun-dependent binding).

      As shown above, in the LPS (only) condition, both IKZF1 (Ikaros) and cJUN bind to Il10 CNS9 locus. Thus, this is not in conflict with our luciferase assay data in Fig. 4E, which showed Ikaros is dependent on AP-1 binding. Moreover, the AP-1 site in Fig. 4D and 4E can be bound by other AP-1 factors as well, such as JUND, JUNB, BATF, etc. These points can be made in the manuscript. These factors potentially can compete with cJUN binding and their roles remain to be explored.

      Comment on statements in results section: The luciferase assays in B and T cells do not demonstrate the role of the proteins Ikaros or Jun directly (page 10, lines 208 and surrounding text). The assay measures an effect of the DNA sequences (implying binding of some transcription factor(s)), but does not identify which protein factors bind there.

      We agree with the reviewer. It is reasonable and even likely that different family members may be partially redundant. This point is now made on our revised manuscript.

      Lastly, the authors only discuss Ikaros (using the term IKZF1 which is the gene symbol for the Ikaros protein). There are other Ikaros family members that have high homology and that are reported to bind similar DNA sequences (for instance Aiolos and Helios), which are expressed in B-cells and T-cells. A discussion of this is of relevance, as these are different proteins, although belonging to the same family (the Ikaros-family) of transcription factors. For instance, western for Aiolos and Helios will likely detect Aiolos in the B cells used, and Helios in the T cells used.

      We agree with the reviewer. As requested, we now discuss the possibility that Aiolos or Helios may also contribute.

      Reviewer #2 (Public Review):

      The study is performed with old tool Spamo (12 year ago), source data from Encode (2010-2012), even peak caller tool version MACS is old ~ 2013. De novo motif search tool is old too (new one STREME is not mentioned). Any composite element search tool published for the recent 12 years are not cited, there are some issues in data analysis in presentation. Almost all references are from about 8-10 year ago (the most recent date is 2019)

      The title is misleading

      Instead of “A new pipeline SPICE identifies novel JUN-IKZF1 composite elements”

      It should be written as “Application of SpaMo tool identifies novel JUN-IKZF1 composite elements”

      It reflects the pipeline better but honestly shows that the novelty is missed.

      Regarding the above two points, we respectfully disagree with the reviewer. Although SpaMo was used, the pipeline we developed is new and our findings are distinctive. The pipeline can systematically screen and predict novel protein-protein binding complex, and our discovery related to IKZF1-JUN composite element is new and the biological findings and validation are distinctive. This point is now made in the revised manuscript. As requested, we have added some additional references.

      The study was performed on too old data from ENCODE, authors mentioned 343 Encode ChIP-Seq libraries, but authors even did not care even about to set for each library the name of target TF (Figure 1E, Figure S2, Table 2).

      Although we used ENCODE data, which was in part when we initially developed the algorithm, those data are valid and using them allowed us to demonstrate the functionality of SPICE, which is versatile and can be used on datasets of one’s choice as well. As requested, in the revised manuscript we have added the names of the TFs in Figs, Fig. S2, and Table 1.

      Reviewer #3 (Public Review):

      The authors of this study have designed a novel screening pipeline to detect DNA motif spacing preferences between TF partners using publicly available data. They were able to recapitulate previously known composite elements, such as the AP-1/IRF4 composite elements (AICE) and predict many composite elements that are expected to be very useful to the community of researchers interested in dissecting the regulatory logic of mammalian enhancers and promoters. The authors then focus on a novel, SPICE predicted interaction between JUN and IKZF1, and show that under LPS and IL-21 treatment, JUN and IKZF1 in B cells have significant overlap in their genomic localization. Next, to know whether the two TFs physically interact, a co-immunoprecipitation experiment was performed. While JUN immunoprecipitated with an anti-IKZF1 antibody, curiously IKZF1 did not immunoprecipitate with an anti-JUN antibody. Finally, EMSA and luciferase experiments were performed to show that the two TFs bind cooperatively at an IL20 upstream probe.

      The reviewer has described the basic results of the study.

      Major strengths:

      1) SPICE was able to recapitulate previously known composite elements, such as the AP-1/IRF4 composite elements (AICE).

      2) Under LPS and IL-21 treatment, JUN and IKZF1 in B cells have significant overlap in their genomic localization. This is very good supporting evidence for the efficacy of SPICE in detecting TF partners.

      We are glad that the reviewer believes that SPICE is effective in detecting TF partners.

      Major weaknesses:

      1) The authors fail to convincingly show that IKZF1 and Jun physically interact. A quantitative measurement of their interaction strength would have been ideal.

      We agree that it is not conclusive that the factors interact directly as opposed to binding to nearby sites on DNA, which is what SPICE was intended to detect. We never intended to claim that we established a definite physical interaction. The coIP worked in one direction, but not reliably in the other, even though we have tried a total of four different antibodies. We now mention in the revised manuscript that we have tried the additional anti-JUN antibodies, cJun (60A8, CST) and JunD (D17G2, CST).

      2) The super-shift experiment to show that the proteins bound to their EMSA probe were indeed IKZF1 and JUN are not very convincing and would benefit from efforts to quantify the shift (Figure 3E). Nuclear extracts from cells with single or double CRISPR knock outs of the two TFs would have been ideal.

      We agree that using single or double knockouts would be helpful, but other Ikaros family or Jun family members could be involved, so such studies might not be definitive. That is why we used purified proteins to show apparent cooperative binding (Figure 4C).

      3) There is a second band beneath the more prominent band in the EMSA experiment with recombinant IKZF1 and JUN (Figure 4C). This second band is most probably bound by IKZF1 because it becomes weaker when the IKZF1 site is mutated and is completely absent when only JUN is added. This is completely ignored by the authors. Therefore, experiments with EMSA fail to convincingly show that IKZF1 and Jun bind cooperatively. They could just as well bind independently to the two sites.

      The second band has a faster mobility and might relate to IKZF1, although this is difficult to know. We comment on this band on revised manuscript. As noted above, the purified protein experiments do suggest cooperativity. However, our overall intent was to identify factors binding in proximity, which SPICE has successfully done, even if the binding was “independent”.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) It is a nice study but lacks some functional data required to determine how useful these alleles will be in practice, especially in comparison with the figure line that stimulated their creation.

      We are grateful for this comment. For the usefulness of these alleles, figure 3 shows that specific and efficient genetic manipulation of one cell subpopulation can be achieved by mating across the DreER mouse strain to the rox-Cre mouse strain. In addition, figure 6 shows that R26-loxCre-tdT can effectively ensure Cre-loxP recombination on some gene alleles and for genetic manipulation. The expression of the tdT protein is aligned with the expression of the Cre protein (Alb roxCre-tdT and R26-loxCre-tdT, figure 2 and figure 5), which ensures the accuracy of the tracing experiments. We believe more functional data can be shown in future articles that use mice lines mentioned in this manuscript.

      (2) The data in Figure 5 show strong activity at the Confetti locus, but the design of the newly reported R26-loxCre line lacks a WPRE sequence that was included in the iSure-Cre line to drive very robust protein expression.

      Thank you for coming up with this point in the manuscript. In the R26-loxCre-tdT mice knock-in strategy, the WPRE sequence is added behind the loxCre-P2A-tdT sequence.

      (3) the most valuable experiment for such a new tool would be a head-to-head comparison with iSure (or the latest iSure version from the Benedito lab) using the same CreER and target foxed allele. At the very least a comparison of Cre protein expression between the two lines using identical CreER activators is needed.

      According to the reviewer’s suggestion, we will compare iSuRe-Cre with R26-loxCre-tdT by using Alb-CreER and target R26-Confetti in the revised manuscript.

      (4) Why did the authors not use the same driver to compare mCre 1, 4, 7, and 10? The study in Figure 2 uses Alb-roxCre for 1 and 7 and Cdh5-roxCre for 4 and 10, with clearly different levels of activity driven by the two alleles in vivo. Thus whether mCre1 is really better than mCre4 or 10 is not clear.

      Thank you for raising this concern. After screening out four robust versions of mCre, we generated these four roxCre knock-in mice. It is unpredictable for us which is the most robust mCre in vivo. It might be one or two mCre versions that work efficiently. For example, if Alb-mCre1 was competitive with Cdh5-mCre10, we can use them for targeting genes in different cell types, broadening the potential utility of these mice.

      (5) Technical details are lacking. The authors provide little specific information regarding the precise way that the new alleles were generated, i.e. exactly what nucleotide sites were used and what the sequence of the introduced transgenes is. Such valuable information must be gleaned from schematic diagrams that are insufficient to fully explain the approach.

      Thank you for your careful suggestions.

      We will provide schematic figures as well as nucleotide sequences for mice generation in the revised manuscript.

      Reviewer #2 (Public Review):

      (1) The scenario where the lines would demonstrate their full potential compared to existing models has not been tested.

      We are grateful for this suggestion. We will compare iSuRe-Cre with R26-loxCre-tdT by using Alb-CreER and target R26-Confetti in the revised manuscript.

      (2) The challenge lies in performing such experiments, as low doses of tamoxifen needed for inducing mosaic gene deletion may not be sufficient to efficiently recombine multiple alleles in individual cells while at the same time accurately reporting gene deletion. Therefore, a demonstration of the efficient deletion of multiple floxed alleles in a mosaic fashion would be a valuable addition.

      Thank you for your constructive comments. Mosaic analysis using sparse labeling and efficient gene deletion would be our future direction using roxCre and loxCre strategies. We will include some discussion of using such strategy in the revised manuscript.

      (3) When combined with the confetti line, the reporter cassette will continue flipping, potentially leading to misleading lineage tracing results.

      Thank you for your professional comments. Indeed, the confetti used in this study can continue flipping, which would lead to potentially misleading lineage tracing results. Our use of R26-Confetti is to demonstrate the robustness of mCre for recombination. Some multiple-color mice lines that don’t flip have been published, for example, R26-Confetti2(10.1038/s41588-019-0346-6) and Rainbow (10.1161/CIRCULATIONAHA.120.045750). These reporters could be used for tracing Cre-expressing cells, without concerns of flipping of reporter cassettes.

      (4) Constitutive expression of Cre is also associated with toxicity, as discussed by the authors in the introduction.

      Thank you for your professional comments. The toxicity of constitutive expression of Cre and the toxicity associated with tamoxifen treatment in CreER mice line (10.1038/s44161-022-00125-6) are known to the field. This study can’t solve the toxicity of the constitutive expression of Cre in this work. Many mouse lines with constitutive Cre driven by different promoters are present across various fields, representing similar toxicity. To solve this issue, it would be possible to construct a new strategy that enables the removal of Cre after its expression.

      Reviewer #3 (Public Review):

      (1) Although leakiness is rather minor according to the original publication and the senior author of the study wrote in a review a few years ago that there is no leakiness(https://doi.org/10.1016/j.jbc.2021.100509).

      Thank you so much for your careful check. In this review (https://doi.org/10.1016/j.jbc. 2021.100509), the writer’s comments on iSuRe-Cre are on the reader's side, and all summary words are based on the original published paper (10.1038/s41467-019-10239-4). Currently, we have tested iSuRe-Cre in our hands. We did detect some leakiness in the heart and muscle, but hardly in other tissues as shown in the following figure.

      Author response image 1.

      Leakiness in Alb CreER;iSuRe-Cre mouse line. Pictures are representative results for 5 mice. Scale bars, white 100 µm.

      (2) I would have preferred to see a study, which uses the wonderful new tools to address a major biological question, rather than a primarily technical report, which describes the ongoing efforts to further improve Cre and Dre recombinase-mediated recombination.

      We gratefully appreciate your valuable comment. The roxCre and loxCre mice mentioned in this study provide more effective methods for inducible genetic manipulation in studying gene function. We hope that the application of our new genetic tools could help address some major biological questions in different biomedical fields in the future.

      (3) Very high levels of Cre expression may cause toxic effects as previously reported for the hearts of Myh6-Cre mice. Thus, it seems sensible to test for unspecific toxic effects, which may be done by bulk RNA-seq analysis, cell viability, and cell proliferation assays. It should also be analyzed whether the combination of R26-roxCre-tdT with the Tnni3-Dre allele causes cardiac dysfunction, although such dysfunctions should be apparent from potential changes in gene expression.

      We are sorry that we mistakenly spelled R26-loxCre-tdT into R26-roxCre-tdT in our manuscript. We have not generated R26-roxCre-tdT mouse line. We also thank the reviewer for concerns about the toxicity of high Cre expression. The toxicity of constitutive expression of Cre and the toxicity of tamoxifen treatment of CreER mice line (10.1038/s44161-022-00125-6) are known to the field. This study can’t solve the toxicity of the constitutive expression of Cre in this work. Many mouse lines with constitutive Cre driven by different promoters are present across various fields, representing similar toxicity. To solve this issue, it would be possible to construct a new strategy that enables the removal of Cre after its expression.

      (4) Is there any leakiness when the inducible DreER allele is introduced but no tamoxifen treatment is applied? This should be documented. The same also applies to loxCre mice.

      In this study, we come up with new mice tool lines, including Alb roxCre1-tdT, Cdh5 roxCre4-tdT, Alb roxCre7-GFP, Cdh5 roxCre10-GFP and R26-loxCre-tdT. As the data shown in supplementary figure 1, supplementary figure 2, and figure 4D, Alb roxCre1-tdT, Cdh5 roxCre4-tdT, Alb roxCre7-GFP, Cdh5 roxCre10-GFP and R26-loxCre-tdT are not leaky. Therefore, if there is any leakiness driven by the inducible DreER or CreER allele, the leakiness is derived from the DreER or CreER. We will supplement relevant experimental data in the revision.

      (5) It would be very helpful to include a dose-response curve for determining the minimum dosage required in Alb-CreER; R26-loxCre-tdT; Ctnnb1flox/flox mice for efficient recombination.

      Thank you for your suggestion. We understand the reviewer’s concern. We can do a dose-response curve in the revision work.

      (6) In the liver panel of Figure 4F, tdT signals do not seem to colocalize with the VE-cad signals, which is odd. Is there any compelling explanation?

      As the file-loading website has a file size limitation, the compressed image results in some signal unclear. The following are the zoom-out figures. The staining in Figure 4F will be optimized and high-resolution images will be provided in the revision.

      Author response image 2.

      (7) The authors claim that "virtually all tdT+ endothelial cells simultaneously expressed YFP/mCFP" (right panel of Figure 5D). Well, it seems that the abundance of tdT is much lower compared to YFP/mCFP. If the recombination of R26-Confetti was mainly triggered by R26-loxCre-tdT, the expression of tdT and YFP/mCFP should be comparable. This should be clarified.

      Thank you so much for your careful check. We checked these signals carefully and didn't find the “much lower” tdT signal. As the file-loading website has a file size limitation, the compressed image results in some signal unclear. We attached clear high resolution images here. The following figure shows how we split the tdT signal and compared it with YFP/mCFP.

      Author response image 3.

      (8) In several cases, the authors seem to have mixed up "R26-roxCre-tdT" with "R26-loxCre-tdT". There are errors in #251 and #256.Furthermore, in the passage from line #278 to #301. In the lines #297 and #300 it should probably read "Alb-CreER; R26-loxCretdT;Ctnnb1flox/flox"" rather than "Alb-CreER;R26-tdT2;Ctnnb1flox/flox".

      We are grateful for these careful observations. We have corrected these typos accordingly.

    1. Author Response

      Reviewer #1 (Public Review):

      We thank the Reviewer for their comments.

      Reviewer #2 (Public Review):

      1) In Figure 4, the authors injected a retrograde tracer in the NA and an anterograde tracer in DCN to find potential "nodes" of overlap. From this experiment, the authors identify the VTA and regions of the thalamus as potential areas of tracer overlap, but it is unclear how many other brain regions were examined. Did the authors jump straight to likely locations of overlap based on previous findings, or were large swaths of the brain examined systematically? If other brain regions were examined, which regions and how was this done? A table listing which brain regions were examined and the presence/intensity of ctb-Alexa568 and GFP fluorescence would be helpful.

      We thank the Reviewer for their comments. Exhaustive characterizations of inputs into nucleus accumbens (NAc) as well as of direct outputs of the deep cerebellar nuclei (DCN) have appeared elsewhere (e.g, Ma et al., 2020 doi: 10.3389/fnsys.2020.00015; Novello et al., 2022 doi: 10.1007/s12311-022-01499-w). Our anatomical investigations with retrograde and anterograde tracers were focused on putative intermediary nodal regions with robust inputs from the DCN, clear outputs to NAc, and limbic functionality. Only a handful of brain regions fulfill these criteria, and from those, we chose to target the VTA and intralaminar thalamus based on the observation that cerebellar activation induces dopamine release in the NAc medial shell and core (Holloway et al., 2019 doi: 10.1007/s12311-019-01074-w; Low et al., 2021 10.1038/s41586-021-04143-5) and on our previous work on DCN projections to the midline thalamus (Jung et al., 2022 doi: 10.3389/fnsys.2022.879634).

      2) In Figure 5, the authors inject AAV1-Cre in DCN and AAV-FLEX-tdTomato in VTA or thalamus. This is an interesting experiment, but controls are missing. An important control is to inject AAV-FLEX-tdTomato in the VTA or thalamus in the absence of AAV1-Cre injection in DCN. Cre-independent expression of tdTomato should be assessed in the VTA/thalamus and the NA.

      We thank the reviewer for bringing up this important control. We routinely perform this control experiment to test for any “leakiness” of floxed vectors prior to use but we did not include it in the paper. In response to the Reviewer’s comment, we show results from this control below. Briefly, we performed a large injection (300 nl) of AAV-FLEX-tdTomato in the thalamus area together with AAV-EGFP (to visualize the injection). No Cre-expressing virus was injected anywhere in the brain. PFA-fixed brain slices were obtained at 3 weeks post-injection and imaged for EGFP and tdTomato. Author Response Figure 1 shows images of the injected thalamus area. No tdTomato expression (Fig. 1C, red) was observed despite abundant EGFP expression (Fig. 1B, green), which confirms that in the absence of Cre the floxed construct does not “leak”.

      Author response image 1.

      (related to Fig. 5 of manuscript). Control experiment for “leakiness” of floxed tdTomato. A, Epifluorescence image of thalamus region in brain slice with EGFP (green) and tdTomato (red) channels merged. Gain settings in the red channel were increased intentionally, to ensure detection of any “leaky” cells. B, Cellular EGFP expression marks successful viral injection. C, No cellular expression of tdTomato without Cre. Note diffuse red signal from background fluorescence.

      Reviewer #3 (Public Review):

      1) The novelty of this paper lies in the mapping of projections from the interposed and the lateral nuclei of the cerebellum, as the authors themselves mention. However, in some of the experiments the medial nucleus is also clearly injected (Fig. 4B and 6B). In those experiments, it is impossible to distinguish which nucleus these projections come from, and they could be the ones from the medial nucleus that were previously described (see above).

      We thank the Reviewer for their comments. As stated in the Results and in the legend of Fig. 4, in addition to experiments with injections in all DCN (Fig. 4B-D), we also performed experiments with injections in only the lateral nucleus (Fig. 4E and F). The results of these experiments support the existence of lateral DCN projections that overlap with NAc-projecting neurons in VTA and thalamus. This finding is further corroborated by our transsynaptic experiments with lateral DCN-only injections (Fig. 5E,F). Regarding the optophysiological experiments, as mentioned in the Results, all DCN were injected (Fig. 6B) in order to maximize ChR2 expression and the chances of successful stimulation of projections.

      2) A strength of the paper is the use of both electrical and optogenetic stimulation. However, the responses to the two in the NAc are very different - electrical stimulation results in both excitation and inhibition, whereas opto stimulation mostly results in only excitation.

      We thank the Reviewer for this comment. At least two different explanations could account for the observed differences in the prevalence of inhibitory responses elicited by optogenetic vs electrical stimulation: i) inhibitory response prevalence is sensitive to stimulation intensity (Table 1 and Fig. 2B). Because of inherent differences between optogenetic and electrical stimulation, it is not possible to directly compare intensities between the two modalities in order to determine at which intensities, if at all, the prevalence of responses should match. The observation that, at least in the VTA, the prevalence of inhibitory responses elicited by 1 mW light intensity (the lowest intensity that we tested) was comparable to the prevalence of inhibitory responses elicited by 100 µA electrical stimulation is in line with this explanation; ii) DCN electrical stimulation is not node-specific. To our knowledge, there is currently no evidence to support mesoscale topographic organization in lateral and interposed DCN that is node-specific. Consequently, electrical stimulation of DCN could, in principle, result in NAc responses through various polysynaptic pathways and/or nodes. This possibility would still exist even if electrical stimulation had limited spread of a few hundred microns (as in our experiments) and is at least partly supported by the observed long latencies of electrically-evoked inhibitory responses (NAcCore: 268 ± 25 ms (10th percentile: 42 ms), NAcMed: 259 ± 14 ms (10th percentile: 60 ms). Our recognition of this intrinsic limitation of DCN electrical stimulation was the impetus behind the node-specific optogenetic experiments.

      3) The stimulation frequency at which the electrical stimulation in Fig 1 is done to identify responses in the NAc is 200 Hz for 25 ms. Is this physiological? In addition, responses in the NAc are measured for 500 ms after, which is a very long response time.

      Regarding stimulation frequency, DCN neurons readily reach firing rates between 100-200 Hz in vivo and ex vivo (e.g., Beekhof et al., 2021 doi.org/10.3390/cells10102686; Sarnaik & Raman, 2018 doi:10.7554/eLife.29546; Canto et al., 2016 doi:10.1371/journal.pone.0165887). Regarding the duration of the response window, at the outset of our experiments we were agnostic to the type of responses that our stimulation protocols would evoke in NAc. We therefore established a response time window that would allow us to capture both fast neurotransmitter-mediated responses as well as neuromodulatory (e.g., dopaminergic) responses, which are known to occur at hundred-millisecond latencies or longer (Wang et al., 2017 doi.org/10.1016/j.celrep.2017.02.062; Chuhma et al., 2014 doi:10.1016/j.neuron.2013.12.027; Gonon, 1997). A posteriori analysis indicated that even if we reduced the response time window by 50%, the ratio of DCN-evoked excitatory vs inhibitory responses in NAc would not change substantially (E/I500: 4.3 vs E/I250: 5).

      4) Previous studies have described how different cell types within the DCN have different downstream projections (Fujita et al. 2020). However, the experiments here bundle together all this known heterogeneity.

      We agree with the Reviewer that dissecting the contributions of specific DCN cell types to NAc circuits is an important next step, which we are excited to undertake in future studies. Here we have broken new ground by identifying for the first time nodes and functional properties of DCN-NAc connectivity. Performing these studies with DCN cell type-specificity was not justified or feasible, given that the molecular identity of participating DCN neurons is currently unknown.

      5) Previous studies have also highlighted the importance of different cell types within the NAc and how input streams are differentially targeted to them. Here, that heterogeneity is also obscured.

      Along the same lines as #4, we agree with the Reviewer about the importance of examining the cellular identity of NAc neurons that participate in DCN-NAc circuitry. We are excited to undertake these examinations using ex vivo approaches, which are well suited to dissect cellular and molecular mechanisms.

      6) In Fig. 4C, E and F, the experiments on overlap between anterograde and retrograde tracers are not particularly convincing - it's hard to see the overlap.

      We thank the reviewer for this comment and have included revised figure panels 4C5, E3, Author response image 1 and Figure 2 below. Single optical sections with altered color scheme and orthogonal confocal views are presented in order to show the overlap between DCN projections and NAc-projecting nodal neurons more clearly. The findings of these imaging experiments are corroborated by the results of our transsynaptic labeling approach (Fig. 5), which we have validated elsewhere (Jung et al., 2022 doi:10.3389/fnsys.2022.879634; and Author response image 1).

      Author response image 2.

      (related to Fig. 4 of manuscript). Co-localization of NAc-projecting neurons with DCN axonal projections in VTA and thalamus. A-D, Single optical sections and orthogonal views are shown. Green: EGFP-expressing DCN axons; white: ctb- Alexa 568; red: anti-TH (A-B; VTA) or NeuN (C-D; thalamus). White arrows in orthogonal views point to regions of overlap.

    1. Author Response

      Reviewer #1 (Public Review):

      Comment 1:

      The pharmacological tools used in this study are highly non-selective. Gd3+, used here to block NALCN is actually more commonly used to block TRP channels. 2-APB inhibits not only TRPC channels, but also TRPM and IP3 receptors while stimulating TRPV channels (Bon and Beech, 2013), while FFA actually stimulates TRPC6 channels while inhibiting other TRPCs (Foster et al., 2009).

      We agree with the reviewer that the substances mentioned are not specific. Although we performed shRNA experiments against NALCN and TRPC6, we do plan to use more specific pharmacological modulators for these two channels; for this, L703,606 (the antagonist of NALCN) [1] and larixyl acetate (a potent TRPC6 inhibitor) [2] will be used. Actually, we have completed experiments of using larixyl acetate and the results are shown in Author response image 1.

      Author response image 1.

      Example time-course (A), traces (B) and the summaried data (C) for the effect of larixyl acetate (LA), the antagonist of TRPC6 channel, on the spontaneous firing activity of VTA DA neurons. Paired-sample T test, ** P < 0.01. n is number of neurons recorded and N is number of mice used

      Comment 2:

      The multimodal approach including shRNA knockdown experiments alleviates much of the concern about the non-specific pharmacological agents. Therefore, the author's claim that NALCN is involved in VTA dopaminergic neuron pacemaking is well-supported.

      However, the claim that TRPC6 is the key TRPC channel in VTA spontaneous firing is somewhat, but not completely supported. As with NALCN above, the pharmacology alone is much too non-specific to support the claim that TRPC6 is the TRP channel responsible for pacemaking. However, unlike the NALCN condition, there is an issue with interpreting the shRNA knockdown experiments. The issue is that TRPC channels often form heteromers with TRPC channels of other types (Goel, Sinkins and Schilling, 2002; Strübing et al., 2003). Therefore, it is possible that knocking down TRPC6 is interfering with the normal function of another TRPC channel, such as TRPC7 or TRPC4.

      According with your advice, we plan to perform single-cell qPCR experiments to check the expression level of other TRPC channels, after selective knockdown of TRPC6 in VTA DAT+ neurons, results will be shown later in the revised version. From our single-cell RNA-seq results, TRPC7 and TRPC4 are found not to be present broadly like TRPC6 in the VTA DA neurons, therefore it is possible that knocking down TRPC6 maybe not interfering with the normal function of another TRPC channel, such as TRPC7 or TRPC4.

      Comment 3:

      The claim that TRPC6 channels in the VTA are involved in the depressive-like symptoms of CMUS is supported.

      However, the connection between the mPFC-projecting VTA neurons, TRPC6 channels, and the chronic unpredictable stress model (CMUS) of depression is not well supported. In Figure 2, it appears that the mPFC-projecting VTA neurons have very low TRPC6 expression compared to VTA neurons projecting to other targets. However, in figure 6, the authors focus on the mPFC-projecting neurons in their CMUS model and show that it is these neurons that are no longer sensitive to pharmacological agents non-specifically blocking TRPC channels (2-APB, see above comment). Finally, in figure 7, the authors show that shRNA knockdown of TRPC6 channels (in all VTA dopaminergic neurons) results in depressive-like symptoms in CMUS mice. Due to the low expression of TRPC6 in mPFC-projecting VTA neurons, the author's claims of "broad and strong expression of TRPC6 channels across VTA DA neurons" is not fully supported. Because of the messy pharmacological tools used, it cannot be clamed that TRPC6 in the mPFC-projecting VTA neurons is altered after CMUS. And because the knockdown experiments are not specific to mPFC-projecting VTA neurons, it cannot be claimed that reducing TRPC6 in these specific neurons is causing depressive symptoms.

      The reason we focused on the mPFC-projecting VTA DA neurons is that this pathway is indicated in depressive-like behaviors of the CMUS model[3-5]. Although mPFC-projecting VTA DA neurons seem have lower level of TRPC6, we reason they are still functional there. However, we do agree with the reviewer that the statement “broad and strong expression of TRPC6 channels across VTA DA neurons" is not fully supported. We have changed the statements based on the reviewer suggestion. Furthermore, we also plan to selectively knockdown TRPC6 in the mPFC-projecting VTA DA neurons, and then study the behavior.

      Comment 4:

      It is important to note that the experiments presented in Figure 1 have all been previously performed in VTA dopaminergic neurons (Khaliq and Bean, 2010) including showing that low calcium increases VTA neuron spontaneous firing frequency and that replacement of sodium with NMDG hyperpolarizes the membrane potential.

      We agree with reviewer that similar experiments have been performed previously [6]for the flow of our manuscript and for general readers.

      Comment 5:

      The authors explanation for the increase in firing frequency in 0 calcium conditions is that calcium-activated potassium channels would no longer be activated. However, there is a highly relevant finding that low calcium enhances the NALCN conductance through the calcium sensing receptor from Dejian Ren's lab (Lu et al., 2010) which is not cited in this paper. This increase in NALCN conductance with low calcium has been shown in SNc dopaminergic neurons (Philippart and Khaliq, 2018), and is likely a factor contributing to the low-calcium-mediated increase in spontaneous VTA neuron firing.

      We agree with the reviewer and thanks for the suggestions. A discussion for this has been added.

      Comment 6:

      One of the only demonstrations of the expression and physiological significance of TRPCs in VTA DA neurons was published by (Rasmus et al., 2011; Klipec et al., 2016) which are not cited in this paper. In their study, TRPC4 expression was detected in a uniformly distributed subset of VTA DA neurons, and TRPC4 KO rats showed decreased VTA DA neuron tonic firing and deficits in cocaine reward and social behaviors.

      We thank the reviewer for the suggestion.The references and a discussion for this has been added.

      Comment 7:

      Out of all seven TRPCs, TRPC5 is the only one reported to have basal/constitutive activity in heterologous expression systems (Schaefer et al., 2000; Jeon et al., 2012). Others TRPCs such as TRPC6 are typically activated by Gq-coupled GPCRs. Why would TRPC6 be spontaneously/constitutively active in VTA DA neurons?

      In a complex neuronal environment where VTA DA neurons are located, multiple modulatory factors including the GPCRs could be dynamically active, this could lead to the activation of TRP channels including TRPC6.

      Comment 8:

      A new paper from the group of Myoung Kyu Park (Hahn et al., 2023) shows in great detail the interactions between NALCN and TRPC3 channels in pacemaking of SNc DA neurons.

      The reference mentioned has been added. We thank the reviewer.

      Reviewer #2 (Public Review):

      Comment 1:

      These results do not show that TRPC6 mediates stress effects on depression-like behavior. As stated by the authors in the first sentence of the final paragraph, "downregulation of TRPC6 proteins was correlated with reduced firing activity of the VTA DA neurons, the depression-like behaviors, and that knocking down of TRPC6 in the VTA DA neurons confer the mice with depression behaviors." Therefore, the results show associations between TRPC6 downregulation and stress effects on behavior, occlusion of the effects of one by the other on some outcome measures, and cell manipulation effects that resemble stress effects. There is no experiment that shows reversal of stress effects with cell/circuit-specific TRPC6 manipulations. Please adjust the title, abstract and interpretation accordingly.

      We agree with the reviewer’s suggestion. The title was changed to ‘’The cation channel mechanisms of subthreshold inward depolarizing currents in the VTA dopaminergic neurons and their roles in the chronic stress-induced depression-like behavior” and the abstract and interpretation were also adjusted accordingly.

      Comment 2:

      Statistical tests and results are unclear throughout. For all analyses, please report specific tests used, factors/groups, test statistic and p-value for all data analyses reported. In some cases, the chosen test is not appropriate. For example, in Figure 6E, it is not clear how an experiment with 2 factors (stress and drug) can be analyzed with a 1-way RM ANOVA. The potential impact of inappropriate statistical tests on results makes it difficult to assess the accuracy of data interpretation.

      We have redone the statistical analysis as suggested by the reviewer and added specific tests used, factors/groups, test statistic and p-value for all data analyses into the revised manuscript.

      Comment 3:

      Why were only male mice used? Please justify and discuss in the manuscript. Also, change the title to reflect this.

      Although most similar previous studies used male mice or rats[7, 8], we do agree with the reviewer that the female animals should also be tested, in consideration possible role of sex hormones, as such we plan to repeat some key experiments on female mice.

      Comment 4:

      Number of recorded cells is very low in Figure 1. Where in VTA did recordings occur? Given the heterogeneity in this brain region, this n may be insufficient. Additional information (e.g., location within VTA, criteria used to identify neurons) should be included. Report the number of mice (i.e., n = 6 cells from X mice) in all figures.

      Yes indeed, the number here is not high. More experiments will be performed to increase the N/n number. And the location of recorded cells in VTA and the number of used mice are now shown in all figures; criteria to identify neurons is stated in the Methods- Identification of DA neurons and electrophysiological recordings. At the end of electrophysiological recordings, the recorded VTA neurons were collected for single-cell PCR. VTA DA neurons were identified by single-cell PCR for the presence of TH and DAT.

      Comment 5:

      Authors refer to VTA DA neurons as those that are DAT+ in line 276, although TH expression is considered the standard of DAergic identity, and studies (e.g., Lammel et al, 2008) have shown that a subset of VTA DA neurons have low levels of DAT expression. Authors should reword/clarify that these are DAT-expressing VTA DA neurons.

      The study published by Lammel[9] in 2015 has shown the low dopamine specificity of transgene expression in ventral midbrain of TH-Cre mice; on the other hand, DAT-Cre mice exhibit dopamine-specific Cre expression patterns, although DAT-Cre mice are likely to suffer from their own limitations (for example, low DAT expression in mesocortical DA neurons may make it difficult to target this subpopulation, see Lammel et al., 2008[10]). Hence, in our study, the DAT was used as criteria to identify DAT neurons. Of course, TH and DAT were all tested in single-cell PCR to identify whether the recorded cells were DA neurons.

      Comment 6:

      Neuronal subtype proportions should be quantified and reported (Fig. 1Aii).

      Neuronal subtype proportions are now quantified and reported in Fig. 1Aii.

      Comment 7:

      In addition to reporting projection specificity of neurons expressing specific channels, it would be ideal to report these data according to spatial location in VTA.

      The spatial location of recorded cells in VTA are now shown in all figures.

      Comment 8:

      The authors state that there are a small number of Glut neurons in VTA, then they state that a "significant proportion" of VTA neurons are glutamatergic.

      Thanks, “a significant proportion of neurons” has been changed to “ less than half of sequenced DA neurons”.

      Comment 9:

      It is an overstatement that VTA DA neurons are the key determinant of abnormal behaviors in affective disorders.

      Thanks, we have amended the statement to that “Dopaminergic (DA) neurons in the ventral tegmental area (VTA) play an important role in mood, reward and emotion-related behaviors”.

      Reviewer #3 (Public Review):

      Comment 1:

      The authors of this study have examined which cation channels specifically confer to ventral tegmental area dopaminergic neurons their autonomic (spontaneous) firing properties. Having brought evidence for the key role played by NALCN and TRPC6 channels therein, the authors aimed at measuring whether these channels play some role in so-called depression-like (but see below) behaviors triggered by chronic exposure to different stressors. Following evidence for a down-regulation of TRPC6 protein expression in ventral tegmental area dopaminergic cells of stressed animals, the authors provide evidence through viral expression protocols for a causal link between such a down-regulation and so-called depression-like behaviors. The main strength of this study lies on a comprehensive bottom-up approach ranging from patch-clamp recordings to behavioral tasks. However, the interpretation of the results gathered from these behavioral tasks might also be considered one main weakness of the abovementioned approach. Thus, the authors make a confusion (widely observed in numerous publications) with regard to the use of paradigms (forced swim test, tail suspension test) initially aimed (and hence validated) at detecting the antidepressant effects of drugs and which by no means provide clues on "depression" in their subjects. Indeed, in their hands, the authors report that stress elicits changes in these tests which are opposed to those theoretically seen after antidepressant medication. However, these results do not imply that these changes reflect "depression" but rather that the individuals under scrutiny simply show different responses from those seen in nonstressed animals. These limits are even more valid in nonstressed animals injected with TRPC6 shRNAs (how can 5-min tests be compared to a complex and chronic pathological state such as depression?). With regard to anxiety, as investigated with the elevated plus-maze and the open field, the data, as reported, do not allow to check the author's interpretation as anxiety indices are either not correctly provided (e.g. absolute open arm data instead of percents of open arm visits without mention of closed arm behaviors) or subjected to possible biases (lack of distinction between central and peripheral components of the apparatus).

      We agree with the reviewer that behavior tests we used here is debatable whether they represent a real depression state, and this is an open question that could be discussed from different respective. Since these testes (forced swimming and tail suspension), as the reviewer noted, were “widely observed in numerous publications”, we used these seemly only options to reflect a “depression-like” state. One could argue that since these testes were initially used for testing antidepressants (“validated”), with decreased immobility time as indications of anti-depressive effects, why not an increased immobility time reflect a “depression-like” state. As for anxiety tests, both absolute time in open and closed arms are now provided.

    1. Author response:

      Responses to Editors:

      We appreciate Reviewer 1’s first concern regarding the difficulty of disentangling the contributions of tightly-coupled brain regions to the speech-gesture integration process—particularly due to the close temporal and spatial proximity of the stimulation windows and the potential for prolonged disruption. We would like to provide clarification and evidence supporting the validity of our methodology.

      Our previous study (Zhao et al., 2021, J. Neurosci) employed the same experimental protocol—using inhibitory double-pulse transcranial magnetic stimulation (TMS) over the inferior frontal gyrus (IFG) and posterior middle temporal gyrus (pMTG) in one of eight 40-ms time windows. The findings from that study demonstrated a time-window-selective disruption of the semantic congruency effect (i.e., reaction time costs driven by semantic conflict), with no significant modulation of the gender congruency effect (i.e., reaction time costs due to gender conflict). This result establishes that double-pulse TMS provides sufficient temporal precision to independently target the left IFG and pMTG within these 40-ms windows during gesture-speech integration. Importantly, by comparing the distinctively inhibited time windows for IFG and pMTG, we offered clear evidence of distinct engagement and temporal dynamics between these regions during different stages of gesture-speech semantic processing.

      Furthermore, we reviewed prior studies utilizing double-pulse TMS on structurally and functionally connected brain regions to explore neural contributions across timescales as brief as 3–60 ms. These studies, which encompass areas from the tongue and lip areas of the primary motor cortex (M1) to high-level semantic regions such as the pMTG and ATL (Author response table 1), consistently demonstrate the methodological rigor and precision of double-pulse TMS in disentangling the neural dynamics of different regions within these short temporal windows.

      Author response table 1.

      Double-pulse TMS studies on brain regions over 3-60 ms time interval

      Response to Reviewer #1:

      (1) For concern on the difficulty of disentangling the contributions of tightly-coupled brain regions to the speech-gesture integration process:

      We trust that the explanation provided above has clarified this issue.

      (2) For concern on the rationale for delivering HD-tDCS/TMS in set time windows for each region, as well as how these time windows were determined and how the current results compare to our previous studies from 2018 and 2023:

      The current study builds on a series of investigations that systematically examined the temporal and spatial dynamics of gesture-speech integration. In our earlier work (Zhao et al., 2018, J. Neurosci), we demonstrated that interrupting neural activity in the IFG or pMTG using TMS selectively disrupted the semantic congruency effect (reaction time costs due to semantic incongruence), without affecting the gender congruency effect (reaction time costs due to gender incongruence). These findings identified the IFG and pMTG as critical hubs for gesture-speech integration. This informed the brain regions selected for subsequent studies.

      In Zhao et al. (2021, J. Neurosci), we employed a double-pulse TMS protocol, delivering stimulation within one of eight 40-ms time windows, to further examine the temporal involvement of the IFG and pMTG. The results revealed time-window-selective disruptions of the semantic congruency effect, confirming the dynamic and temporally staged roles of these regions during gesture-speech integration.

      In Zhao et al. (2023, Frontiers in Psychology), we investigated the semantic predictive role of gestures relative to speech by comparing two experimental conditions: (1) gestures preceding speech by a fixed interval of 200 ms, and (2) gestures preceding speech at its semantic identification point. We observed time-window-selective disruptions of the semantic congruency effect in the IFG and pMTG only in the second condition, leading to the conclusion that gestures exert a semantic priming effect on co-occurring speech. These findings underscored the semantic advantage of gesture in facilitating speech integration, further refining our understanding of the temporal and functional interplay between these modalities.

      The design of the current study—including the choice of brain regions and time windows—was directly informed by these prior findings. Experiment 1 (HD-tDCS) targeted the entire gesture-speech integration process in the IFG and pMTG to assess whether neural activity in these regions, previously identified as integration hubs, is modulated by changes in informativeness from both modalities (i.e., entropy) and their interactions (mutual information, MI). The results revealed a gradual inhibition of neural activity in both areas as MI increased, evidenced by a negative correlation between MI and the tDCS inhibition effect in both regions. Building on this, Experiments 2 and 3 employed double-pulse TMS and event-related potentials (ERPs) to further assess whether the engaged neural activity was both time-sensitive and staged. These experiments also evaluated the contributions of various sources of information, revealing correlations between information-theoretic metrics and time-locked brain activity, providing insights into the ‘gradual’ nature of gesture-speech integration.

      We acknowledge that the rationale for the design of the current study was not fully articulated in the original manuscript. In the revised version, we will provide a more comprehensive and coherent explanation of the logic behind the three experiments, ensuring clear alignment with our previous findings.

      (3) For concern about the use of Pearson correlation and the normality of EEG data.

      We appreciate the reviewer’s thoughtful consideration. In Figure 5 of the manuscript, we have already included normal distribution curves that illustrate the relationships between the average ERP amplitudes within each ROI or elicited clusters and the three information models. Additionally, multiple comparisons were addressed using FDR correction, as outlined in the manuscript.

      To further clarify the data, we will calculate the Shapiro-Wilk test, a widely accepted method for assessing bivariate normality, for both the MI/entropy and averaged ERP data. The corresponding p-values will be provided in the following-up point-to-point responses.

      (4) For concern about the ROI selection, and the suggestion of using whole-brain electrodes to build models of different variables (MI/entropy) to predict neural responses:

      For the EEG data, we conducted both a traditional region-of-interest (ROI) analysis, with ROIs defined based on a well-established work (Habets et al., 2011), and a cluster-based permutation approach, which utilizes data-driven permutations to enhance robustness and address multiple comparisons. The latter method complements the hypothesis-driven ROI analysis by offering an exploratory, unbiased perspective. Notably, the results from both approaches were consistent, reinforcing the reliability of our findings.

      To make the methods more accessible to a broader audience, we will provide a clear description of the methods used and how they relate to each other in the revised manuscript.

      Reference:

      Habets, B., Kita, S., Shao, Z.S., Ozyurek, A., and Hagoort, P. (2011). The Role of Synchrony and Ambiguity in Speech-Gesture Integration during Comprehension. J Cognitive Neurosci 23, 1845-1854. 10.1162/jocn.2010.21462

      (5) For concern about the median split of the data:

      To identify ERP components or spatiotemporal clusters that demonstrated significant semantic differences, we split each model into higher and lower halves, focusing on indexing information changes reflected by entropy or mutual information (MI). To illustrate the gradual activation process, the identified components and clusters were further analyzed for correlations with each information matrix. Remarkably, consistent results were observed between the ERP components and clusters, providing robust evidence that semantic information conveyed through gestures and speech significantly influenced the amplitude of these components or clusters. Moreover, the semantic information was shown to be highly sensitive, varying in tandem with these amplitude changes.

      We acknowledge that the rationale behind this approach may not have been sufficiently clear in the initial manuscript. In our revision, we will ensure a more detailed and precise explanation to enhance the clarity and coherence of this logical framework.

      Response to Reviewer #2:

      We greatly appreciate Reviewer2 ’s concern regarding whether "mutual information" adequately captures the interplay between the meanings of speech and gesture. We would like to clarify that the materials used in the present study involved gestures performed without actual objects, paired with verbs that precisely describe the corresponding actions. For example, a hammering gesture was paired with the verb “hammer”, and a cutting gesture was paired with the verb “cut”. In this design, all gestures conveyed redundant meaning relative to the co-occurring speech, creating significant overlap between the information derived from speech alone and that from gesture alone.

      We understand the reviewer’s concern about cases where gestures and speech may provide complementary rather than redundant information. To address this, we have developed an alternative metric for quantifying information gains contributed by supplementary multisensory cues, which will be explored in a subsequent study. However, for the present study, we believe that the observed overlap in information serves as an indicator of the degree of multisensory convergence, a central focus of our investigation.

      Regarding the reviewer’s concern about how the neural processes of speech-gesture integration may change with variations in the relative timing between speech and gesture stimuli, we would like to highlight findings from our previous study (Zhao, 2023, Frontiers in Psychology). In that study, we explored the semantic predictive role of gestures relative to speech under two conditions: (1) gestures preceding speech by a fixed interval of 200 ms, and (2) gestures preceding speech of its semantic identification point. Interestingly, only in the second condition did we observe time-window-selective disruptions of the semantic congruency effect in the IFG and pMTG. This led us to conclude that gestures play a semantic priming role for co-occurring speech. Building on this, we designed the present study with gestures preceding speech of its semantic identification point to reflect this semantic priming relationship. Additionally, ongoing research is exploring gesture and speech interactions in natural conversational settings to investigate whether the neural processes identified here are consistent across varying contexts.

      To prevent any similar concerns from causing doubt among the audience and to ensure clarity regarding the follow-up study, we will provide a detailed discussion of the two issues in the revised manuscript.

      Response to Reviewer #3:

      The primary aim of this study is to investigate whether the degree of activity in the established integration hubs, IFG and pMTG, is influenced by the information provided by gesture-speech modalities and/or their interactions. While we provided evidence for the differential involvement of the IFG and pMTG by delineating their dynamic engagement across distinct time windows of gesture-speech integration and associating these patterns with unisensory information and their interaction, we acknowledge that the mechanisms underlying these dynamics remain open to interpretation. Specifically, whether the observed effects stem from difficulties in semantic control processes, as suggested by Reviewer 3, or from resolving information uncertainty, as quantified by entropy, falls outside the scope of the current study. Importantly, we view these two interpretations as complementary rather than mutually exclusive, as both may be contributing factors. Nonetheless, we agree that addressing this question is a compelling avenue for future research. In the revised manuscript, we will include an exploratory analysis to investigate whether the confounding difficulty, stemming from the number of lexical or semantic representations, is limited to high-entropy items. Additionally, we will address and discuss alternative interpretations.

      Regarding the concern of conceptual equivocation, we would like to emphasize that this study represents the first attempt to focus on the relationship between information quantity and neural engagement. In our initial presentation, we inadvertently conflated the commonly used term "graded hub," which refers to anatomical distribution, with its usage in the present context. We sincerely apologize for this oversight and are grateful for the reviewer’s careful critique. In the revised manuscript, we will clearly articulate the study’s objectives, clarify the representations of entropy and mutual information, and accurately describe their association with neural engagement.

      Reference

      Teige, C., Mollo, G., Millman, R., Savill, N., Smallwood, J., Cornelissen, P. L., & Jefferies, E. (2018). Dynamic semantic cognition: Characterising coherent and controlled conceptual retrieval through time using magnetoencephalography and chronometric transcranial magnetic stimulation. Cortex, 103, 329-349.

      Amemiya, T., Beck, B., Walsh, V., Gomi, H., & Haggard, P. (2017). Visual area V5/hMT+ contributes to perception of tactile motion direction: a TMS study. Scientific reports, 7(1), 40937.

      Muessgens, D., Thirugnanasambandam, N., Shitara, H., Popa, T., & Hallett, M. (2016). Dissociable roles of preSMA in motor sequence chunking and hand switching—a TMS study. Journal of Neurophysiology, 116(6), 2637-2646.

      Vernet, M., Brem, A. K., Farzan, F., & Pascual-Leone, A. (2015). Synchronous and opposite roles of the parietal and prefrontal cortices in bistable perception: a double-coil TMS–EEG study. Cortex, 64, 78-88.

      Pitcher, D. (2014). Facial expression recognition takes longer in the posterior superior temporal sulcus than in the occipital face area. Journal of Neuroscience, 34(27), 9173-9177.

      Bardi, L., Kanai, R., Mapelli, D., & Walsh, V. (2012). TMS of the FEF interferes with spatial conflict. Journal of cognitive neuroscience, 24(6), 1305-1313.

      D’Ausilio, A., Bufalari, I., Salmas, P., & Fadiga, L. (2012). The role of the motor system in discriminating normal and degraded speech sounds. Cortex, 48(7), 882-887.

      Pitcher, D., Duchaine, B., Walsh, V., & Kanwisher, N. (2010). TMS evidence for feedforward and feedback mechanisms of face and body perception. Journal of Vision, 10(7), 671-671.

      Gagnon, G., Blanchet, S., Grondin, S., & Schneider, C. (2010). Paired-pulse transcranial magnetic stimulation over the dorsolateral prefrontal cortex interferes with episodic encoding and retrieval for both verbal and non-verbal materials. Brain Research, 1344, 148-158.

      Kalla, R., Muggleton, N. G., Juan, C. H., Cowey, A., & Walsh, V. (2008). The timing of the involvement of the frontal eye fields and posterior parietal cortex in visual search. Neuroreport, 19(10), 1067-1071.

      Pitcher, D., Garrido, L., Walsh, V., & Duchaine, B. C. (2008). Transcranial magnetic stimulation disrupts the perception and embodiment of facial expressions. Journal of Neuroscience, 28(36), 8929-8933.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This is an interesting study on the role of FGF signaling in the induction of primitive streak-like cells (PS-LC) in human 2D-gastruloids. The authors use a previously characterized standard culture that generates a ring of PS-LCs (TBXT+) and correlate this with pERK staining. A requirement for FGF signaling in TBXT induction is demonstrated via pharmacological inhibition of MEK and FGFR activity. A second set of culture conditions (with no exogenous FGFs) suggests that endogenous FGFs are required for pERK and TBXT induction. The authors then characterize, via scRNA-seq, various components of the FGF pathway (genes for ligands, receptors, ERK regulators, and HSPG regulation). They go on to characterize the pFGFR1, receptor isoforms, and polarized localization of this receptor. Finally, they perform FGF4 inhibition and use a cell line with a limited FGF17 inactivation (heterozygous null) and show that loss of these FGFs reduces PS-LC and derivative cell types.

      Strengths:

      (1) As the authors point out, the role of FGF signaling in gastrulation is less well understood than other signaling pathways. Hence this is a valuable contribution to that field.

      (2) The FGF4 and FGF17 loss-of-function experiments in Figure 5 are very intriguing. This is especially so given the intriguing observation that these FGFs appear to be dominating in this model of human gastrulation, in contrast to what FGFs dominate in mice, chicks, and frogs.

      (3) In general this paper is valuable as a further development of the Human gastruloid system and the role of FGF signaling in the induction of PS-CLs. The wide net that the authors cast in characterizing the FGF ligand gene, receptor isoforms, and downstream components provides a foundation for future work. As the authors write near the beginning of the Discussion "Many questions remain."

      We thank the reviewer for these positive comments.

      Weaknesses:

      (1) FGFs are cell survival factors in various aspects of development. The authors fail to address cell death due to loss of FGF signaling in their experiments. For example, in Figure 1E (which requires statistical analysis) and 1G (the bottom FGFRi row), there appears to be a significant amount of cell loss. Is this due to cell death? The authors should address the question of whether the role of FGF/ERK signaling is to keep the cells alive.

      Indeed, FGF also strongly affects cell number and it is an interesting question to what extent this depends on ERK. Our manuscript focuses instead on the role of FGF/ERK signaling in cell fate patterning. However, as mentioned in our discussion, figure 1de show that doxycycline induced pERK leads to more TBXT+ cells than the control without restoring cell number, suggesting the role of FGF in controlling cell number is independent of the requirement for FGF/ERK in PS-LC differrentiation. Unpublished data below showing a MEK inhibitor dose response further supports this: low doses of MEKi are sufficient to inhibit differentiation without affecting cell number. To address the reviewer’s question we will include this data in the revised manuscript and perform several additional experiments to determine in more detail how cell death and proliferation depend on FGF.

      Author response image 1.

      MEK affects differentiation and cell number at different doses. a-c) control and MEKi (0.3uM) treated colonies with similar cell number but different TBXT expression. d-f) quantification of cell number per colonies (d), percentage of TBXT-positive cell per colony (e), and the distribution of pERK intensities for different doses of MEK inhibitor (f). N>6 colonies per condition. MEKi = PD0325901. Scalebar = 50 micron.

      (2) Regarding the sparse cells in 1G, is there a reduction in cell number only with FGFRi and not MEKi? Is this reproducible? Gattiglio et al (Development, 2023, PMID: 37530863) present data supporting a "community effect" in the FGF-induced mesoderm differentiation of mouse embryonic stem cells. Could a community effect be at play in this human system (especially given the images in the bottom row of 1G)? If the authors don't address this experimentally they should at least address the ideas in Gattoglio et al.

      Indeed, FGFRi reproducibly affects cell number more than MEKi, in line with the fact that pathways downstream of FGF other than MAPK/ERK (e.g. PI3K) play important roles in cell survival and growth. We think the lack of differentiation in MEKi and FGFRi in Fig.1g cannot be attributed to a loss of cells combined with a community effect. This is because without FGFRi or MEKi cells also differentiate to primitive streak at much lower densities than those shown, consistent with the data we show above in response to (1), which argue against a primarily indirect effect of FGF on PS-LC differentiation through cell density. In the context of directed differentiation (rather than 2D gastruloids), we will show this in a controlled manner by repeating the experiment in Fig.1g while adjusting cell seeding densities to obtain similar final cell densities in all three conditions. We will also include Gattoglio et al. in our revised discussion.

      (3) Do the FGF4 and FGF17 LOF experiments in Figure 5 affect cell numbers like FGFRi in Figure 1?

      It seems the effect on cell number is small but we will analyze this carefully and include it in the revised manuscript. A small effect would be consistent with our unpublished data below showing a near uniform proliferation rate. This in turn suggests that low levels of pERK in the center are sufficient to maintain proliferation there while the much higher pERK levels in the PS-LC ring (that we think depend on FGF4 and FGF17) do not signifcantly increase the proliferation rate (see Fig.1 in the manuscript for the pERK pattern). Thus, loss of high pERK in PS-LC ring while maintaining low pERK throughout would not be expected to have a major impact on cell number but would impact differentiation. In contrast, loss of all FGF signaling through FGFRi does dramatically affect cell number. This is again consistent with the data provided in response to (1) showing that ERK levels can be reduced to a point where PS-LC differentiation is lost without significantly affecting cell number. We will include the data below in the revised manuscript.

      Author response image 2.

      Why examine PS-LC induction only in FGF17 heterozygous cells and not homozygous FGF17 nulls?

      We were unable to obtain homozygous FGF17 nulls, it is not clear if there is a reason for this. We will try again and otherwise attempt to corroborate our findings with further knockdown data.

      (4) The idea that FGF8 plays a dominant role during gastrulation of other species but not humans is so intriguing it warrants deeper testing. The authors dismiss FGF8 because its mRNA "...levels always remained low." (line 363) as well as the data published in Zhai et al (PMID: 36517595) and Tyser et al (PMID: 34789876). But there are cases in mouse development where a gene was expressed at levels so low, that it might be dismissed, and yet LOF experiments revealed it played a role or even was required in a developmental process. The authors should consider FGF8 inhibition or inactivation to explore its potential role, despite its low levels of expression.

      We agree with the reviewer that FGF8 is worth investigating further and we will now pursue this.

      (5) Redundancy is a common feature in FGF genetics. What is the effect of inhibiting FGF4 in FGF17 LOF cells?

      We will attempt to do the experiment the reviewer suggests.

      (6) I suggest stating that the authors take more caution in describing FGF gradients. For example, in one Results heading they write "Endogenous FGF4 and FGF17 gradients underly the ERK activity pattern.", implying an FGF protein gradient. However, they only present data for FGF mRNA , not protein. This issue would be clarified if they used proper nomenclature for gene, mRNA (italics), and protein (no italics) throughout the paper.

      We will edit the paper to more clearly distinguish protein and mRNA.

      Reviewer #2 (Public review):

      Summary:

      The role of FGFs in embryonic development and stem cell differentiation has remained unclear due to its complexity. In this study, the authors utilized a 2D human stem cell-based gastrulation model to investigate the functions of FGFs. They discovered that FGF-dependent ERK activity is closely linked to the emergence of primitive streak cells. Importantly, this 2D model effectively illustrates the spatial distribution of key signaling effectors and receptors by correlating these markers with cell fate markers, such as T and ISL1. Through inhibition and loss-of-function studies, they further corroborated the needs of FGF ligands. Their data shows that FGFR1 is the primary receptor, and FGF2/4/17 are the key ligands for primitive streak development, which aligns with observations in primate embryos. Additional experiments revealed that the reduction of FGF4 and FGF17 decreases ERK activity.

      Strengths:

      This study provides comprehensive data and improves our understanding of the role of FGF signaling in primate primitive streak formation. The authors provide new insights related to the spatial localization of the key components of FGF signaling and attempt to reveal the temporal dynamics of the signal propagation and cell fate decision, which has been challenging.

      Weaknesses:

      Given the solid data, the work only partially clarifies the complex picture of FGF signaling, so details remain somewhat elusive. The findings lack a strong punchline, which may limit their broader impact.

      We thank this reviewer for their valuable feedback and the compliment on the solidity of our data. The punchline of our work is that FGF4- and FGF17-dependent ERK signaling plays a key role in human PS-LC differentiation, and that these are different FGFs than those thought to drive mouse gastrulation. A second key point is that like BMP and TGFβ signaling, FGF signaling is restricted to the basolateral sides of pluripotent stem cell colonies due to polarized receptor expression, which is crucial for understanding the response to exogenous ligands added to the cell medium. Indeed, many facets of FGF signaling remain to investigated in the future, such as how FGF regulates and is regulated by other signals, which we will dedicate a different manuscript to.

      Reviewer #3 (Public review):

      Jo and colleagues set out to investigate the origins and functions of localized FGF/ERK signaling for the differentiation and spatial patterning of primitive streak fates of human embryonic stem cells in a well-established micropattern system. They demonstrate that endogenous FGF signaling is required for ERK activation in a ring-domain in the micropatterns, and that this localized signaling is directly required for differentiation and spatial patterning of specific cell types. Through high-resolution microscopy and transwell assays, they show that cells receive FGF signals through basally localized receptors. Finally, the authors find that there is a requirement for exogenous FGF2 to initiate primitive streak-like differentiation, but endogenous FGFs, especially FGF4 and FGF17, fully take over at later stages.

      Even though some of the authors' findings - such as the localized expression of FGF ligands during gastrulation and the importance of FGF/ERK signaling for cell differentiation in the primitive streak - have been reported in model organisms before, this is one of the first studies to investigate the role of FGF signaling during primitive streak-like differentiation of human cells. In doing so, the paper reports a number of interesting and valuable observations, namely the basal localization of FGF receptors which mirrors that of BMP and Nodal receptors, as well as the existence of a positive feedback loop centered on FGF signaling that drives primitive-streak differentiation. The authors also perform a comparison of the role of different FGFs across species and try to assign specific functions to individual FGFs. In the absence of clean genetic loss-of-function cell lines, this part of the work remains less strong.

      We thank the reviewer for emphasizing the value of our findings in a human model for gastrulation. We agree more loss-of-function experiments would provide further insight into the role of different FGFs, and we plan to provide additional data along these lines in the revised manuscript.

    1. Author Response

      We thank the reviewers and editorial team for the positive reaction to our paper and for the constructive recommendations and comments on our work. Here we provide a brief provisional response to key points that were identified. We will give a detailed point-by-point response with highlighted changes in our manuscript when we upload the revised version of our paper.

      Reviewer 1:

      Statistical evaluation of the null

      In Experiment 2, we inferred the existence of a null effect of image category on suppression depth based on frequentist statistics. At the reviewer’s suggestion we performed a statistical evaluation of the evidence in favour of the null effect using a Bayesian repeated measures ANOVA implemented in JASP. That analysis provides strong evidence for the null (BF01= 20.38) and will be included in the final version of the paper.

      Likelihood of exceptional cases

      We acknowledge that our selection of categories is only a sampling of possible categories to which our novel tCFS method can be applied for deriving suppression depth. Other possibilities that come to mind include objects that emerge from specific configurations of simple 'tokens' such as dots (such as actions defined by biological motion (Watson et al., 2004)) or different shaped tokens configured to generate pareidolia faces (Zhou et al., 2021). We will expand on the possibility of these exceptional cases impacting bCFS and reCFS thresholds in the discussion of our revised manuscript.

      Reviewer 2:

      In response to the claim “the paper overreaches by claiming breakthrough thresholds are insufficient for drawing certain conclusions about subconscious processing.”

      We agree that breakthrough thresholds can provide useful information to draw conclusions about unconscious processing – as our procedure is predicated on breakthrough thresholds. Our key point is that breakthrough provides only half of the needed information and will amend our manuscript accordingly. In so doing, we will also shift our focus toward the influence of semantics and low-level factors, including discussion of the possibility that suppression depth and bCFS thresholds could be driven by statistically orthogonal factors.

      Reviewer 3:

      On the appropriateness of log-transformed contrast

      Our motivation to quantify suppression depth after log-transform to decibel scale was two-fold. First, we recognised that the traditional use of a linear contrast ramp in bCFS is at odds with the well-characterised profile of contrast discrimination thresholds which obey a power law (Legge, 1981) and the observations that neural contrast response functions show the same compressive non-linearity in many different cortical processing areas (e.g.: V1, V2, V3, V4, MT, MST, FST, TEO. See Ekstrom et al., 2009). Increasing contrast in linear steps could thus lead to a rapid saturation of the response function, which may account for the overshoot that has been reported in many canonical bCFS studies. For example, in Jiang et al. (2007), target contrast reached 100% after 1 second, yet average suppression times for faces and inverted faces were 1.36 and 1.76 seconds respectively. As contrast response functions in visual neurons saturate at high contrast, the upper levels of a linear contrast ramp have less and less effect on the target's strength. This approach to response asymptote may have exaggerated small differences between stimulus conditions and may have inflated some previously reported differences. In sum, the use of a log-transformed contrast ramp allows finer increments in contrast to be explored before saturation, a simple manipulation which we hope will be adopted by our field.

      Second, by quantifying suppression depth as a decibel change, we enable the comparison of suppression depth between experiments and laboratories, which inevitably differ in presentation environments. As a comparison, a reaction-time for bCFS of 1.36 s cannot easily be compared without access to near-identical stimulation and testing environments. In addition, once ramp contrast is log-transformed it effectively linearises the neural contrast response function. This means that different studies that use different contrast levels for masker or target can be directly compared because a given suppression depth (for example, 15 dB) is the same proportionate difference between bCFS and reCFS regardless of the contrasts used in the particular study.

      We also acknowledge that different stimulus categories may engage neural and visual processing associated with different contrast gain values (e.g., magno- vs parvo-mediated processing). But the breaks and returns to suppression of a given stimulus category would be dependent on the same contrast gain function appropriate for that stimulus which thus permits their direct comparison. Indeed, this is why our novel approach offers a promising technique for comparing suppression depth associated with various stimulus categories (a point mentioned above). Viewed in this way, differences in actual durations of break times (such as we report in our paper) may tell us more about differences in gain control within neural mechanisms responsible for processing of those categories.

      Consider that preferential processing could shift both bCFS and reCFS thresholds together

      This is related to the point raised in the previous comment. A stimulus that is preferentially processed (such as a face) could have lower bCFS and reCFS thresholds than other stimuli such that it emerges into awareness at a lower contrast but also remains visible at lower contrasts. We plan to address this interpretation of our data in our revised discussion and highlight that this type of preferential processing could well occur, and yet could still produce the same uniform suppression depth.

      Can the effect of contrast ramp be explained by slower RTs?

      A 500 ms reaction time estimate would not account for the magnitude of the changes observed in Experiment 3. Suppression depths in our slow, medium, and fast contrast ramps were 9.64 dB, 14.64 dB and 18.97 dB, respectively (produced by step sizes of .035, .07 and .105 dB per video frame at 60 fps). At each rate, assuming a 500 ms reaction time for both thresholds (1 second total) would capture a change of 2.1 dB, 4.2 dB, 6.3 dB. This difference cannot account for the size of the effects observed between our different ramp speeds.

      Non-zero switch rate probability affecting ramping

      We agree that for a given ramp speed there is a variable probability of a switch in perceptual state for both bCFS and reCFS portions of the trial. To put it in other words, for a given ramp speed and a given observer the distribution of durations at which transitions occur will exhibit variance. We see that variance in our data (just as it’s present in conventional binocular rivalry duration histograms), as a non-zero probability of switches at very short durations (for example). One might surmise that slower ramp speeds would afford more opportunity for stochastic transitions to occur and that the measured suppression depths for slow ramps are underestimates of the suppression depth produced by contrast adaptation. Yet by the same token, the same underestimation would occur during fast ramp speeds, indicating that that difference may be even larger than we reported. In our revision we will spell this out in more detail, and indicate that a non-zero probability of switches at any time may lead to an underestimation of all recorded suppression depths.

      In our data, we believe the contribution of these stochastic switches are minimal. Our current Supplementary Figure 1(d) indicates that there is a non-zero probability of responses early in each ramp (e.g. durations < 2 seconds), yet these are a small proportion of all percept durations. This small proportion is clear in the empirical cumulative density function of percept durations, which we include in Author response image 1, and will address in our detailed response. Notably, during slow-ramp conditions, average percept durations actually increased, implying a resistance to any effect of early stochastic switching. We plan to expand on our analysis of these reaction-time differences in our revised manuscript.

      Author response image 1.

      The specificity of the DHO fit

      In our revised manuscript we will increase the justification for this model, and plan to include a comparison of model fits over time (as opposed to response number in the current manuscript).

      References

      Ekstrom, L. B., Roelfsema, P. R., Arsenault, J. T., Kolster, H., & Vanduffel, W. (2009). Modulation of the contrast response function by electrical microstimulation of the macaque frontal eye field. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 29(34), 10683–10694.

      Jiang, Y., Costello, P., & He, S. (2007). Processing of invisible stimuli: advantage of upright faces and recognizable words in overcoming interocular suppression. Psychological Science, 18(4), 349–355.

      Legge, G. E. (1981). A power law for contrast discrimination. Vision Research, 21(4), 457–467.

      Watson, T. L., Pearson, J., & Clifford, C. W. G. (2004). Perceptual grouping of biological motion promotes binocular rivalry. Current Biology: CB, 14(18), 1670–1674.

      Zhou, L.-F., Wang, K., He, L., & Meng, M. (2021). Twofold advantages of face processing with or without visual awareness. Journal of Experimental Psychology. Human Perception and Performance, 47(6), 784–794.

    1. Author response:

      Data replicability

      There are no replicates contained in the manuscript. (Reviewer #1)

      We respectfully disagree with this statement. In this manuscript, we included both cell and animal replicates. For cell replicates, we analyzed over 50.000 cells using RNAscope and over 10.000 cells using RNAseq, employing two independent methods on different animals. We believe this extensive analysis is sufficient by any standards. Regarding animal replicates, we generated four different transgenic lines (two knockin lines and two BAC transgenic lines), which is an uncommon and rigorous effort. We analyzed dozens of animals, consistently observing the expression pattern of Smim32 and its derived transgenes across multiple experiments, including crosses between transgenics and various reporter lines, which is again an uncommon and rigorous effort. These experiments were conducted on animals from different litters to ensure robustness. Additionally, our longitudinal study, which includes 13 animals harvested at two-day intervals from E16 to P20, provides further consistency of our data. 

      However, to underscore the consistency of endogenous Smim32 expression, when submitting a revised manuscript, we will present Smim32 expression levels across individuals in single-cell RNA-seq data. Furthermore, we will pool data from different transgenic animals to demonstrate interindividual variability in the claustrum of adult animals. 

      Additional examples of female mice should also be included and separately quantified. (Reviewer #1)

      We initially analyzed both males and females for one line (the Smim32-Cre knock-in line). Since we observed no differences between males and females (which we will note in the revised manuscript), we subsequently limited our analyses to males to minimize the use of animals. 

      Claustrum definition

      Weaknesses lie in poor anatomical definitions of the claustrum (and endopiriform nucleus). (Reviewer #2)

      No other orthogonal approaches were used to define the claustrum, such as retrograde neuroanatomical tracing from cortex. (Reviewer #3)

      We share the reviewers’ opinion that the claustrum (CLA) and endopiriform nucleus (EN) are poorly defined anatomically in rodent brains due to the limited development of white matter tracts. This ambiguity has led to many conflicting descriptions of CLA/EN boundaries in various papers and atlases, including those by Paxinos and the Allen Brain Institute. Notably, the Allen Institute frequently updates the shape and anatomical location of the CLA/EN in their reference atlas, resulting in different websites displaying various versions (as illustrated in rebuttal figure 1 at comparable levels of the anteroposterior brain axis). It remains uncertain which version would most effectively satisfy the entire scientific community, if any. Indeed, after many years of working on these structures and surveying the literature, we regret to note that there is currently no consensus on the anatomical definition of the CLA and EN, even among expert laboratories using tracing or staining methods. At one end of the spectrum, some authors define the CLA as a small nucleus that could be, for example, characterized by the PVrich plexus. At the other end, other authors consider it part of a larger complex that includes the EN and extends dorsally to the S2 cortex. Additionally, differing definitions of the core and shell regions, as well as the precise anteroposterior extent of the nucleus, further complicate the issue.

      Author response image 1.

      Comparison of CLA and EN shapes in two recent versions of the Allen brain atlas

      Given this lack of consensus, we deliberately opted for a molecular definition of the claustrum and its projection neurons. We used a set of well-documented canonical markers for the claustrum and neighboring neurons to determine the expression pattern of Smim32. The claustrum-specific markers we selected (Nr4a2, Lxn, Gnb4, Car3, etc.) have been extensively studied and allow us to distinguish claustrum projection neurons from neighboring and intermingled populations. Although none of these individual markers are exclusively specific to CLA and EN neurons, the combined expression of these markers provides greater confidence in identifying the different neuronal populations in space.

      Smim32 expression is used to define claustrum anatomical boundaries, rather than first using several structural, molecular, and connectivity lines of evidence to define the claustrum anatomically and then to assess whether Smim32 expression fits within this anatomical definition. (Reviewer #2)

      Contrary to the reviewer's suggestion, we do not define the claustrum based on Smim32 expression. Instead, Figures 1 and 2 demonstrate that Smim32 expression is highly correlated with the expression of known claustrum markers (Nr4a2, Lxn, Gnb4, Car3, etc.), both regionally and at the cellular level. As suggested by Peng et al. (2021, Fig. 4 and Extended Data Fig. 11), this population of cells, which includes the claustrum, a specific subset of cells in cortical layer 6, and the dorsal endopiriform nucleus, forms a discrete group of neurons sharing the same transcriptomic identity. Given what is known about the connectivity of claustrum and endopiriform nucleus projection neurons, this population obviously includes neurons projecting to various areas, likely fulfilling distinct functions. Whether these cells should be subdivided based on projection area, developmental origin, or structural features is beyond the scope of this article.

      Specificity issues

      Cre/Flp expression driven by the Smim32 promoter is present in non-claustrum regions, including the neighboring cortex, striatum, and endopiriform nucleus as well as the more distant thalamic reticular nucleus. (Reviewer #2)

      The Smim32 gene is not specific to the claustrum. (Reviewer #3)

      We do not claim that endogenous Smim32 expression is exclusive to the claustrum or that the knock-in lines, by themselves, are sufficient to isolate claustrum neurons without combined approaches based on the transgenic lines presented here. However, there are significant differences in the expression pattern between endogenous Smim32 and the expression of Cre in the various derived transgenic lines, which might not have been clear in the current manuscript. Notably, there is no expression of Cre in the striatum and the thalamic reticular nucleus, and only sparse expression in the endopiriform nucleus in Tg61(Smim32-cre). Each transgenic line provides different levels of overlap with the endogenous Smim32 expression, with the Tg61(Smim32-cre)  line allowing for the most specific genetic access to claustrum neurons. Again, for greater specificity, any of these lines could be used in combined approaches, such as viral targeting (as shown in Figure 6A and B) or using transgenic intersectional (dual recombinase) approaches based on Cre- and Flp-expressing mice with an overlap in the claustrum, leading to circuit-specific and/or claustrum-only labeling.

      This means that our claims are supported by the observed data. However, we acknowledge that we may not have clearly explained the specificity of the random transgenes, which could have led some reviewers to believe that « the data do not support the claims ».

      We will clarify these points in the revised manuscript and include additional examples and quantifications to highlight the differences between endogenous Smim32 expression and Cre expression in the transgenic Tg61(Smim32-cre)  line.

      Regarding Cre-expressing cells in the neighboring cortex (layer 6 projection neurons), these cells are genetically distinct from other layer 6 cortical neurons and express the same canonical markers as claustrum projection neurons, likely sharing also the same transcriptomic identity. We will provide a more detailed characterization of these cells in the revised manuscript.

      Since Smim32 driven recombinase (in 61 or 62lrod) is not exclusively expressed in the claustrum, it is not clear how Smim32 is an advantage over possible Nr4a2 or, the more selective, GNB4 Cre driver lines. (Reviewer #2)

      Over the years, we have found a limited number of Cre lines used in the literature for targeting claustrum neurons. These include Gnb4-cre, Slc17a6-cre (also known as Vglut2-cre), Egr2-cre, Tg(Tbx21-cre), Ntng2-cre, Cux2-cre and Esr2-cre lines. We have not found any study describing and/or using an Nr4a2-cre line. Although a Nr4a2-Dre line exists (that we have studied in our laboratories), caution is warranted in its use, as it lacks the complete coding sequence of the Nr4a2 gene.

      One problem with Nr4a2 is its documented expression in the adjacent Layer 6b cortical neurons, which discards it as a suitable candidate to selectively target the claustrum. Furthermore, Nr4a2 is also expressed in a majority of the endopiriform nucleus neurons, whereas endogenous Smim32 is expressed in a smaller proportion of these cells, and is restricted mainly to the dorsal endopiriform nucleus. These reasons led us to select Smim32 over Nr4a2.

      Author response image 2.

      (A) In situ hybridization for various CLA/EN marker genes. (B) Developmental recombination observed outside the CLA/EN in various cre lines (all data from the Allen brain databases)

      What are the advantages of using the different Smim32-cre lines over the existing Cre lines mentioned above?

      Let’s first consider the Gnb4-cre line, which is considered one of the best available. Although the endogenous Gnb4 gene appears to have a similar expression pattern to Nr4a2, Slc17a6, and Smim32 in the striato-claustro-insular region of adult mice (Rebuttal Figure 2A), the results observed with the Gnb4-cre line either shows otherwise, or indicate that the Cre line does not fully recapitulate Gnb4 endogenous expression (Rebuttal Figure 3). Indeed some neurons in the insular cortex, piriform cortex, and putamen express the Cre recombinase (possibly due to low Gnb4 expression not detected in the in situ hybridization data of the Allen brain institute or due to nonspecific transgene expression) and will recombine viral vectors injected in adult mice (Rebuttal Figure 3). Therefore, this Cre expression outside the CLA/EN neurons in the Gnb4-cre line presents complications for data interpretation, depending on the viral injection coordinates and the quantity of injected vectors. 

      Author response image 3.

      Specificity of the Gnb4-Cre line tested with viral transduction in adult mice (all data from the Allen Brain Institute database). The top and middle rows display the same data but with different scaling of the lookup tables to highlight either the patterns of axonal projections (top) or the infected neurons themselves (middle). The bottom row shows a higher magnification of the infection site. Note that individual neurons cannot be resolved in experiment 485903475 due to signal saturation.  

      Cre expression in the CLA appears more specific in the various Smim32-cre transgenic lines than in many of the lines mentioned above. Although we have no doubt that the different existing transgenic lines can target CLA neurons, the selectivity of the targeting (for example, the fraction and types of CLA neurons versus potential non-CLA neurons) remains to be fully described for most of the lines. It is particularly true in the case of Tbx21 and Esr2 (used as drivers for the Tg(Tbx21-cre) and Esr2-cre transgenic lines). Tbx21 is not endogenously expressed in adult CLA neurons (evaluated by in situ and RNAseq data) and Egr2, if expressed in the claustrum, is not restricted to CLA neurons as it is an immediate early gene expressed in recently active neurons (Rebuttal Figure 2A). 

      Cre expression in the EN is observed in all Cre-expressing transgenic lines used to target the claustrum (with the exception of Slc17a6-cre). This can naturally be problematic for some approaches. Luckily, the random integrant Tg61(Smim32-cre) we describe in our manuscript shows a strong expression in the claustrum, and very limited expression outside the CLA (a very weak activity in the EN), representing a novel tool with improved claustrum selectivity. An advantage of the Tg61(Smim32-cre) over the Slc17a6-cre is that more CLA neurons can be targeted with the Tg61(Smim32-cre) line. 

      Another advantage of our four transgenic lines is their versatility; they can be used to recombine reporter lines as well as FRT-floxed and loxP-floxed knockouts in limited neuronal populations. They will be employed in the future for intersectional genetics to exclusively target CLA neurons. Existing transgenic lines cannot offer these possibilities because their marker genes are broadly expressed in the brain during embryogenesis, leading to the impact on a large number of non-CLA/EN neurons. This is evident in the Gnb4-cre and Slc17a6-cre lines crossed with the Ai14 reporter line expressing the fluorescent protein tomato (Rebuttal Figure 2B, right panels). Similar observations have been made for the Ntng2-Cre and Cux2-cre lines (see the Allen Brain Institute database for these data). Alternatively, inducible recombinase systems, such as the Gnb4-IRES2CreERT2-D line, could be used. However, the Gnb4-IRES2-CreERT2-D line requires tamoxifen to induce Cre recombination, which can be problematic depending on the research context, as well as recombinations in the absence of tamoxifen treatment (see experiments 560948627 and 560948194 in the Allen Brain database).

      It is unclear how Smim32 relates to claustrum in other mammalian species (e.g. primates) (Reviewer #3)

      As mentioned in the last paragraph of the introduction of the initial manuscript, Smim32 is specifically expressed in the claustrum of a primate species, Homo sapiens (reference 37 of the initially submitted manuscript).

      Availability of the transgenic mice

      These mice should be made available to the community through commercial vendors. (Reviewer #1 and #2 in private comments)

      We are pleased to see that two of the three reviewers would like to see these mice available. These mice will not be kept for ourselves, and we will distribute them at some point in time, but this will naturally occur after the publication of the revised manuscript.

      Critical comments on discussion and other topics

      A clear description of the search in the Allen Mouse Brain Atlas is missing. A search for Smim32 in the ISH mouse atlas did not provide any hits and so it would be useful to include in the methods or results section the exact query used for examination of Smim32 expression as well as other genes identified in this process. (Reviewer #2)

      Smim32 has been referred to by different names in various versions of the mouse genome. For the readers not versed in navigating genomes and annotations, before being officially named Smim32, this gene was originally called Gm6753 (as noted in the Allen Brain Institute database, see Rebuttal Figure 2A for an example of their in situ data) and later Gm45623.

      Several sentences highlighting the shortfalls of other approaches are overstated and should be toned down. (Reviewer #1)

      Very concerning is problematic language in the abstract and introduction sections that diminish the impact of several published studies (not cited) that have led to important findings regarding claustrum function. The authors create an argument that all the research performed thus far on the claustrum is unreliable because targeting the structure has been sub-optimal. (Reviewer #2)

      A more balanced discussion of the strengths and weaknesses of these mice should be included. (Reviewer #1)

      We regret if our choice of language inadvertently appeared to undermine the contributions of our colleagues; that was certainly not our intention. The paragraph in question was meant to address certain studies that we believe have led to inconsistent findings and unreliable data due to a lack of rigorous methodology in targeting claustrum projection neurons. To avoid singling out specific works, we chose not to cite them directly. We understand that some colleagues whose research does not fall under the “various cases” mentioned may feel unfairly targeted by this statement. We will revise this section to better clarify our intent and ensure it is respectful of all contributions. We will rephrase passages in the abstract, introduction, and discussion to provide a balanced view of the strengths and weaknesses of these mice.

      Our main goal is to provide tools to specifically target claustrum cells based on their transcriptomic identity, which we believe is the best means to assess the function of any neuronal population. Due to the intermingling of claustrum neurons with neighboring populations, employing stereotaxic injections in the claustrum without genetic segregation will always infect and label physically adjacent cells that do not belong to the claustrum, ontologically and functionally speaking. 

      Similarly, targeting claustrum neurons retrogradely by injecting into claustrum projection sites likely labels neurons from different populations. For instance, as reviewer 1 mentions Erwin et al. (2021), infecting retrosplenial projections without genetic specificity labels many claustrum Synpr+ neurons (considered the claustrum core), a small proportion of claustrum Nnat+ neurons (considered the claustrum shell by some, and non-claustrum neurons by others), and some neighboring cortical L6b neurons. These three populations have very different transcriptomic identities, connectivity patterns, and likely distinct functions.

      Thus, we believe that genetic specificity provides an important added value for selectively targeting the claustrum or claustro-insular complex.

      A better characterization of all data should be undertaken. (Reviewer #1)

      Having generated hundreds of transgenic lines over the years, we have never performed a more thorough analysis of transgenic lines, nor have a recollection of reading a publication evaluating at such a precise level the expression pattern of transgenes in mice. We, therefore, do not see exactly what the reviewer means by this remark. It is possible, not being native English speakers, that we did not grasp a certain form of joke.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Paturi et.al. presents a detailed structural and mechanistic study of the DRB7.2:DRB4 complex in plants, focusing on its role in sequestering endogenous inverted-repeat dsRNA precursors and inhibiting Dicer-like protein 3 (DCL3) activity. By truncating the two proteins, they systematically identify the domains involved in direct interaction between DRB7.2 and DRB4 and study the interactions between the two using biophysical techniques (ITC and NMR). They show using NMR that the interacting domains between the two proteins are likely partially unfolded or aggregated in the absence of the binding partner and determining the NMR structure of the individual interacting domains in the presence of the isotopically unlabelled partner using sparse restrain data combined with Rosetta. They also determine the complex structure of the interacting DRB7.2 dsRBD domain and the DRB4 D3 domain using X-ray crystallography.

      Strengths:

      Overall, the manuscript is well written, provides molecular details at high resolution between the interaction of DRB7.2 and DRB4, and the data in the manuscript strongly supports the proposed model where DRB7.2:DRB4 complex sequesters the DCL3 substrates inhibiting its function of producing epigenetically activated siRNAs.

      Weaknesses:

      Major comments:

      (1) The manuscript, unfortunately, completely lacks functional validation of the determined DRB7.2:DRB4 complex structure, which is required for the rigorous validation of the proposed model. For functional validation of the determined structures, the author should at least present the mutational analysis (impact on complex formation, RNA affinity) of the point mutants derived from the structure of the DRB7.2:DRB4 complex.

      We thank the reviewer for pointing out a crucial aspect that is missed out in our manuscript. With the inputs and experiments proposed above, we would certainly like to perform additional mutational analysis to determine the impact on the heterodimeric complex formation and identify the key essential residues involved in the RNA binding.

      We expect that we can accomplish this study in the next ~ 4-6 months as we may have to create a combination of mutations for residues involved in the dimerization interface, namely, T131, V132, E134, F136, W156, and V161 on DRB7.2M. Having said that, the disruption of the heterodimer interface would probably lead to DRB7.2M and DRB4D3 returning to their fast-intermediate timescale exchanging native homo-oligomeric state/partially folded state.

      For dsRNA binding, six residues (i.e., A85 and K86 (a1), H112 and K114 (b1-b2 loop), and K142 and K144 (a2)) involved in the RNA binding interface and a few other residues based on the mutational data will be considered.

      (2) The proposed model shows the DRB7.2M and DRB4D3 as partially folded/aggregated proteins in the absence of the complex, understandably from the presented NMR data of the individual domains. However, in the cellular context, when the RNAs are present, especially DRB7.2M might be properly folded/not aggregated. Could the authors support or negate this by showing the <sup>15</sup>N HSQC spectrum of DRB7.2M in complex with the 13 bp dsRNA?

      While we have no direct proof that the DRB7.2M might be folded/not aggregated in the presence of RNAs in the cellular context, the in vitro NMR-based titration studies of alone DRB7.2 (Author response image 1A) with two molar equivalence of 13 bp dsRNA (Author response image 1B and R1C) indicate that there is no change in overall spectral pattern (except for the apparent chemical shift perturbations as expected from fast-intermediate exchange timescale binding of DRB7.2M with 13 bp dsRNA), implying that the dsRNA alone is neither necessary nor sufficient to disrupt the native fast exchange oligomeric states sampled by individual DRB7.2 and DRB7.2M.

      Author response image 1.

      DRB7.2M binding interaction with 13bp dsRNA (A) 1H-15N TROSY-HSQC of U[15N, 2H] DRB7.2M. (B) 1H-15N TROSY-HSQC of U[15N, 2H] DRB7.2M in the presence of 13 bp dsRNA with 1:2 molar equivalence. (C) An overlay of (A) and (B) indicates no evident changes in the broadening of resonances. (D) The 15N linewidth analysis of unbound (red) and bound (green) forms of U[15N, 2H] DRB7.2M resonances for which the assignment could be traced from the assignments of the DRB7.2M:DRB4D3 complex.

      Furthermore, the line-width analysis, shown in Author response image 1D, implies that the ~R<sub>2</sub> rates are roughly identical in the presence of dsRNA, indicating that the native oligomeric state of DRB7.2M remains unperturbed by the presence of dsRNA. Our observation also corroborates with the crystal structure presented in the manuscript, where we have observed that the hetero-dimeric interface lies on the opposite side of the dsRNA binding interface of the DRB7.2M:DRB4D3 complex.

      Therefore, the dsRNA substrate does not have any role in the native partially folded/oligomeric state of DRB7.2M.

      (3) It remains unclear from the manuscript if DRB7.1 will have a similar or different mechanism of interaction with DRB4. Based on the sequence comparisons of the two proteins, the authors should comment on this in the discussion section.

      Pairwise sequence alignment of full-length DRB7.2 and DRB7.1 reveals 50.7% similarity and a 33.2% identity derived from EMBOSS Needle (Author response image 2).

      Author response image 2.

      ClustalW alignment of full-length DRB7.2 and DRB7.1. The secondary structure elements are derived from the crystal structure of DRB7.2M (PDB ID: 8IGD). Identical residues are marked with red highlights, whereas similar residues are marked with yellow highlights, and the consensus residues (> 50%) are annotated below the sequence alignment.

      As expected, for the dsRBD region (corresponding to DRB7.2M), we observe a much higher degree of alignment with a 76.7% similarity with a 54.7% identity (Author response image 3).

      Author response image 3.

      ClustalW alignment of the dsRBD region of DRB7.2 and DRB7.1. The secondary structure elements are derived from the crystal structure of DRB7.2M (PDB ID: 8IGD). Identical residues are marked with red highlights, whereas similar residues are marked with yellow highlights, and the consensus residues (> 50%) are annotated below the sequence alignment.

      Moreover, the residues involved in the heterodimerization interface in DRB7.2M are identical to those in DRB7.1. As a matter of fact, the residues involved in the dimerization interface, namely, T131, V132, E134, F136, W156, and V161 in DRB7.2M are unchanged in DRB7.1, suggesting that DRB7.1M may interact with DRB4D3 using a similar manner as illustrated for DRB7.2M:DRB4D3 in the manuscript.

      Future studies will shed more light on the binding preference of DRB4D3 with DRB7.1 versus DRB7.2. One interesting thing to note is that DRB7.2 is exclusively present in the nucleus, whereas DRB7.1 is observed to localize in the nucleus as well as the cytoplasm. Therefore, spatial restriction may be one of the mechanisms that bring exclusivity to the interaction partner despite having a conserved interaction interface.

      Minor comments:

      (1) There are no errors for the N, dH, and dS values of the ITC measurements in Table 1. Also, it seems that the measurements are done only once. Values derived from at least triplicates should be presented. This would be helpful to increase confidence in the values derived from ITC, especially for the titration between DRB7.2, DRB4C, and DRB4D3, as the N value there is substantially lower than 1, which does not agree with the other data.

      We plan to estimate the errors as proposed by the reviewer in the revised manuscript to ensure that the presented data is of high confidence.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Paturi and colleagues uses an approach that combines structural biology and biochemistry to probe protein-protein and protein-RNA interactions for two protein factors related to the dsRNA pathway in plants.

      Strengths:

      A key finding in the research is the direct demonstration of the ability of the single dsRBD (double-strand RNA binding domain) of DRB7.2 to interact simultaneously with dsRNA as well as the C-terminal domain of DRB4. The heterodimerization of DRB7.2 and DRB4 is demonstrated to make a high-affinity complex with dsRNA, and it is proposed that this atypical use of the dsRBD domain to bridge the protein and RNA may contribute to the ability to prevent cleavage that would otherwise occur for dsRNA. The primary results for the interactions are generally well-supported by the data, and the conclusions are taken from the available results without excessive speculation.

      Weaknesses:

      There is a need for some statistical repeats, as well as a suggested movement of many protein characterization findings in the solution state to support data or to better indicate how these properties could play a role in the final proposed mechanism. There is also the need for certain measurement replicates, such as for the ITC data, which are derived from single measurements and lack sufficient estimates of error.

      We plan to restructure the manuscript on the lines proposed by the reviewer in the revised version. Moreover, as mentioned in the response to the comments of Reviewer 1, we suggest estimating the errors to ensure that the presented data is of high confidence in the revised version.

    1. Author Response

      We thank the reviewers for their useful and constructive comments. In this provisional response, we will address a few of the major issues and plan to submit a detailed, point-by-point response along with the revised manuscript.

      1. Robustness of activated combination of neurons (the ‘activated ensemble’).

      The reviewers have asked for additional analyses and visualization of the group of neurons activated and a classification analysis to illustrate the point that the activated set of neurons would allow discrimination between different concentrations even after the spiking activity reduced significantly in the later trials. We relied on visualization using PCA (Manuscript Fig. 4) and quantification using correlation analysis (Manuscript Fig. 5a and Manuscript Supplementary Figure 2). But this point can be easily amplified further to support our conclusions and address a major concern raised by the reviewers.

      Visualization of neural responses across trials and odorants: As recommended, we followed the procedures used in Stopfer et al., 2003 (Fig. 6c) and Miura et al., 2012(Fig. 3C) to image neural responses across recorded PNs as a matrix (Author response image 1).

      Author response image 1.

      Author response image 1: Spike counts averaged over the entire 4s odor presentation window across all recorded neurons are shown as a function of trial number (columns). The sorting is same across different panels. Note that there are 80 neurons whose response was monitored for hexanol and octanol responses (Dataset 1; first row of panels), and 81 neurons whose response was monitored for isoamyl acetate and benzaldehyde (Dataset 2; second row of panels). As can be noted, across the 25 trials the pattern of activation remains consistent. Also, the activated combination of neurons varied robustly with odor identity and intensity.

      Classification analysis: To illustrate that there is enough information to recognize an odorant and discriminate between different intensities, we performed a leave-one-trial-out classification analysis. The left-out trial was assigned the class label of its nearest neighbor (using correlation distance metric). The results from this classification analysis are shown below in Author response image 2. As a control, we shuffled the odor class labels and repeated the leave-one-trial-out classification analysis.

      Author response image 2.

      Author response image 2: Results from classification analysis are shown for the two datasets: hexanol–octanol at different concentrations (dataset 1; 80 PNs), and isoamyl acetate and benzaldehyde (dataset 2; 81 PNs). We did a leave-onetrial-out validation. The true odor label is shown along the x-axis and the predicted odor label is shown along the yaxis. As can be noted, the class labels for every single trial were correctly predicted in both datasets. The result after class labels were shuffled is also shown for comparison. These results strongly support our conclusion that odor intensity information is preserved and odor concentration can be recognized independent of adaptation.

      Correlation with the first trial:

      We had shown the correlation across odorants and concentrations as a function of the trial (manuscript Figure 5A). To complement these analyses, here we focus on the correlations with the response evoked in the first trial of each odorant at high and low concentrations and plot this information as a function of trial number (Author response image 3, 4). As can be noted, the correlation across different trials of a given odorant at specific concentrations remains much higher than all other conditions.

      Author response image 3.

      Author response image 3: (top-left) Correlation between 80-dimensional neural responses (averaged over the entire 4s odor presentation window) with the first trial of hexanol at high intensity (hex-H; 1% v/v) is plotted as a function of trial number. (top-right) similar plots but correlation computed with neural responses evoked during the first trial of octanol at high intensity (oct-H; 1% v/v). (bottom-left) similar plots but correlation computed with neural responses evoked in the first trial of hexanol at low intensity (hex-L; 1% v/v). (bottom-right) similar plots but correlation computed with neural responses evoked in the first trial of octanol at low intensity (oct-L; 1% v/v).

      Author response image 4.

      Author response image 4: (top-left) Correlation between 81-dimensional neural responses (averaged over the entire 4s odor presentation window) with the first trial of isoamyl acetate at high intensity (iaa-H; 1% v/v) is plotted as a function of trial number. (top-right) similar plots but correlation computed with neural responses evoked in the first trial of benzaldehyde at a high intensity (bza-H; 1% v/v). (bottom-left) similar plots but correlation computed with neural responses evoked in the first trial of isoamyl acetate at low intensity (iaa-L; 1% v/v). (bottom-right) similar plots but correlation computed with neural responses evoked in the first trial of benzaldehyde at low intensity (bza-L; 1% v/v).

      Behavioral significance and dynamics: The reviewers had wondered about the relevance of the behavior to the organism. The maxillary palps are sensory organs close to the mouth parts that are used to grab food and help with the feeding process. In our previous studies, we had shown that these palpopening responses are innately triggered by many ‘appetitive odorants.’ However, the probability of palp opening varied across different odorants (Chandak and Raman, 2023). Some odorants evoked higher palp-opening responses and others reduced the probability of palp-opening response (below the median value across odorants). Since all other parameters (such as the clicking sound of valves, and mechanical cues due to airflow during odor presentation), are the same across these different odorants, these observed differences in palp-opening response probability are attributed to the identity of the odorants presented.

      Author response image 5.

      Author response image 5: Preference indices were calculated for all odors tested and are shown as a bar plot (n = 26 locusts). Blue bars indicate odors classified as appetitive, gray bars indicate neutral odors and red bars indicate unappetitive odors. Locusts with a significant deviation from the median response (one-sided binomial test, P < 0.1, were classified as either being appetitive or unappetitive; P < 0.1, P < 0.05, **P < 0.01). Error bars indicate s.e.m. [Replotted Fig 1.c from Chandak and Raman, 2023].

      We had also shown that we could train locusts to have stereo-typed palp-opening responses using the classical conditioning approach (odor – odor-conditioned stimulus and food reward – unconditioned stimulus; Video: https://static- content.springer.com/esm/art%3A10.1038%2Fncomms7953/MediaObjects/41467_2015_BFncomms7953 _MOESM483_ESM.mov; Saha et al., 2015). The dynamics of those conditioned palp-opening responses have been well characterized.

      We will use similar tracking procedures to monitor and quantify the dynamics of innate palp-opening responses as well. We will add supplementary videos to fully capture this behavior.

      Early vs. late neural responses:

      Since behavioral responses are more likely to start as soon as the odorant is presented, the reviewers wondered whether there are differences in the observed findings if we focus only on the early neural activity (as it might be more important to triggering behavior). Note that the median response time for conditioned palp-opening responses is less than 750 ms (Saha et al., 2015, Chandak and Raman, 2023). Hence, we divided the neural dataset and analyzed the neural response patterns during these early (0-750 ms after onset) and late (2-4 s after odor onset) time windows. In both these epochs, we found that the total spike counts across neurons reduced as a function of trial number or repetition and the combination of neuron activated remained robust (Author response images 6-11). Hence, we conclude that while the neural responses in different time windows would be important for shaping other parameters of behavioral response dynamics, the overall behavioral response probability that we used in our analysis had a similar relationship with early, late, or total neural activity during the entire odor presentation (i.e. time-window of the neural response did not matter for the analyses presented in the manuscript).

      Author response image 6.

      Author response image 6: Total spike counts reduced as a function of trial number. This reduction was observed for the total spike counts during the entire odor presentation window and during both the early (0-750 ms) and late (2-4 s) response time windows. Dataset 1: 80 PNs, hexanol, and octanol odorants.

      Author response image 7.

      Author response image 7: Total spike counts reduced as a function of trial number. This reduction was observed for the total spike counts during the entire odor presentation window and during both the early (0-750 ms) and late (2-4 s) response time windows. Dataset 2: 81 PNs, isoamyl acetate, and benzaldehyde odorants.

      Author response image 8.

      Author response image 8: Similar plots as in Figures 3 and 4 but analyzing 80-dimensional spike count vectors calculated using only the first 750 ms of odor-evoked response. Note that the correlation with the odor evoked response in the first trial remains high across trials. But between different odorants or different intensities of the same odorant, the response correlation drops significantly. Dataset 1: 80 PNs, hexanol, and octanol odorants.

      Author response image 9.

      Author response image 9: Similar plots as in Figures 3 and 4 but analyzing 80-dimensional spike count vectors calculated using only the last 2 seconds of odor-evoked response. Note that the correlation with the odor evoked response in the first trial remains high across trials. But between different odorants or different intensities of the same odorant, the response correlation drops significantly. Dataset 1: 80 PNs, hexanol, and octanol odorants.

      Author response image 10.

      Author response image 10: Similar plots as in Figures 3 and 4 but analyzing 80-dimensional spike count vectors calculated using only the first 750 ms of odor-evoked response. Note that the correlation with the odor evoked response in the first trial remains high across trials. But between different odorants or different intensities of the same odorant, the response correlation drops significantly. Dataset 2: 81 PNs, isoamyl acetate, and benzaldehyde odorants.

      Author response image 11.

      Author response image 11: Similar plots as in Figures 3 and 4 but analyzing 80-dimensional spike count vectors calculated using only the last 2 seconds of odor-evoked response. Note that the correlation with the odor evoked response in the first trial remains high across trials. But between different odorants or different intensities of the same odorant, the response correlation drops significantly. Dataset 2: 81 PNs, isoamyl acetate, and benzaldehyde odorants.

      Other Statistical Tests:

      The reviewers felt that in many analyses, we did not include error bars to indicate the sample size, SEM, or SD. We will fix this by adding the sample size information to each panel and as appropriate. However, we would also like to point out that many of the analyses are done in a trial-by-trial fashion (e.g. Manuscript Figures 3 – 6). For these analyses, it would not be possible to add SEM or SD. One condition (hex -H or iaa-H) was repeated in each dataset, and we have added them in the results shown in this response letter to demonstrate repeatability. We will strive our best to add these statistics as would be appropriate, but this cannot be done for the trial-by-trial analyses.

      References:

      Stopfer M, Jayaraman V, Laurent G. Intensity versus identity coding in an olfactory system. Neuron. 2003 Sep 11;39(6):991-1004. doi: 10.1016/j.neuron.2003.08.011. PMID: 12971898.

      Miura K, Mainen ZF, Uchida N. Odor representations in olfactory cortex: distributed rate coding and decorrelated population activity. Neuron. 2012 Jun 21;74(6):1087-98. doi: 10.1016/j.neuron.2012.04.021. PMID: 22726838; PMCID: PMC3383608.

      Chandak, R., Raman, B. Neural manifolds for odor-driven innate and acquired appetitive preferences. Nat Commun 14, 4719 (2023). https://doi.org/10.1038/s41467-023-40443-2

      Saha, D., Li, C., Peterson, S. et al. Behavioural correlates of combinatorial versus temporal features of odour codes. Nat Commun 6, 6953 (2015). https://doi.org/10.1038/ncomms7953

    1. Author response:

      We thank the reviewers for their thoughtful comments and constructive suggestions. We describe how we will address each point below and are grateful for the guidance on areas where our work could be clarified or expanded. In particular, we note the following:

      Selection scan summary statistics: In our revised manuscript, we will include summary statistics from the selection scans. We believe this addition will enhance transparency and provide additional context for readers.

      Reporting of outliers: As highlighted by the editor, the reviewers expressed differing views on the most appropriate way to report outliers. To provide a comprehensive and balanced presentation, we will report both the empirical selection statistics and the corresponding converted p-values. This dual approach will allow readers to fully interpret the results under both perspectives.

      Methodological considerations: We have carefully considered the reviewers' methodological suggestions and will incorporate them into our revisions where possible. These changes strengthen the rigor and clarity of the analyses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper reports an analysis of whole-genome sequence data from 40 Faroese. The authors investigate aspects of demographic history and natural selection in this population. The key findings are that the Faroese (as expected) have a small population size and are broadly of Northwest European ancestry. Accordingly, selection signatures are largely shared with other Northwest European populations, although the authors identify signals that may be specific to the Faroes. Finally, they identify a few predicted deleterious coding variants that may be enriched in the Faroes.

      Strengths:

      The data are appropriately quality-controlled and appear to be of high quality. Some aspects of the Faroese population history are characterized, in particular, by the relatively (compared to other European populations) high proportion of long runs of homozygosity, which may be relevant for disease mapping of recessive variants. The selection analysis is presented reasonably, although as the authors point out, many aspects, for example differences in iHS, can reflect differences in demographic history or population-specific drift and thus can't reliably be interpreted in terms of differences in the strength of selection.

      Weaknesses:

      The main limitations of the paper are as follows:

      (1) The data are not available. I appreciate that (even de-identified) genotype data cannot be shared; however, that does substantially reduce the value of the paper. Minimally, I think the authors should share summary statistics for the selection scans, in line with the standard of the field.

      We agree with the reviewer that sharing the selection scan results is important, so in the next revision of this manuscript we will make the selection scan summary statistics publicly available, and clearly lay out the guidelines and research questions for which the data can be accessed.

      (2) The insight into the population history of the Faroes is limited, relative to what is already known (i.e., they were settled around 1200 years ago, by people with a mixture of Scandinavian and British ancestry, have a small effective population size, and any admixture since then comes from substantially similar populations). It's obvious, for example, that the Faroese population has a smaller bottleneck than, say, GBR.

      More sophisticated analyses (for example, ARG-based methods, or IBD or rare variant sharing) would be able to reveal more detailed and fine-scale information about the history of the populations that is not already known. PCA, ADMIXTURE, and HaplotNet analysis are broad summaries, but the interesting questions here would be more specific to the Faroes, for example, what are the proportions of Scandinavian vs Celtic ancestry? What is the date and extent of sex bias (as suggested by the uniparental data) in this admixture? I think that it is a bit of a missed opportunity not to address these questions.

      We clarify that we did quantify the proportions of various ancestry components as estimated by HaploNet in main text Figure 5 and supplemental figures S5 and S6. In our revisions, we will include the average global ancestry of the various components in the Main Text so that this result is more clear.

      We agree that more fine-scale demographic analyses would be informative. We have begun working on an estimation of the admixture date, for example, but have encountered problems with using different standard date estimation software, which give very inconsistent and unstable results. We suspect this might be due to the strong bottleneck experienced in the history of the Faroe Islands breaking one or more of the assumptions of these methods. We will continue working on this problem in coming months, possibly using simulations to assess where the problem might be. We recognize that our relatively small sample size places limits on the fine-scale demographic analyses that can be performed. We are addressing this in ongoing work by generating a larger cohort, which we hope will enable more detailed inference in the future.

      (3) I don't really understand the rationale for looking at HLA-B allele frequencies. The authors write that "ankylosing spondylitis (AS) may be at a higher prevalence in the Faroe Islands (unpublished data), however, this has not been confirmed by follow-up epidemiological studies". So there's no evidence (certainly no published evidence) that AS is more prevalent, and hence nothing to explain with the HLA allele frequencies?

      We agree that no published studies have confirmed a higher prevalence of ankylosing spondylitis (AS) in the Faroe Islands. Our recruitment data suggest that AS might be more common than in other European populations, but we understand that this is only based on limited, unpublished observations and what we are hearing from the community. We emphasized in our original manuscript that this is based on observational evidence from the FarGen project. However, as this reviewer pointed out, we can be more clear that this prevalence has not been formally studied.

      In our next revision we will clarify in the text that our recruitment data suggest a higher prevalence of AS may be possible, but more formal epidemiological studies are needed to confirm this observation. The reason we study HLA-B allele frequencies is to see if the genetic background of the Faroese population could help explain this possible difference, since HLA-B27 is already known to play a strong role in AS.

      Reviewer #2 (Public review):

      In this paper, Hamid et al present 40 genomes from the Faroe Islands. They use these data (a pilot study for an anticipated larger-scale sequencing effort) to discuss the population genetic diversity and history of the sample, and the Faroes population. I think this is an overall solid paper; it is overall well-polished and well-written. It is somewhat descriptive (as might be expected for an explorative pilot study), but does make good use of the data.

      The data processing and annotation follows a state-of-the-art protocol, and at least I could not find any evidence in the results that would pinpoint towards bioinformatic issues having substantially biased some of the results, and at least preliminary results lead to the identification of some candidate disease alleles, showing that small, isolated cohorts can be an efficient way to find populations with locally common, but globally rare disease alleles.

      I also enjoyed the population structure analysis in the context of ancient samples, which gives some context to the genetic ancestry of Faroese, although it would have been nice if that could have been quantified, and it is unfortunate that the sampling scheme effectively precludes within-Faroes analyses.

      We note that although the ancestry proportions are not specified in the main text, we did quantify ancestry proportions in the modern Faroese individuals and other ancient samples, and we visualized these proportions in Figure 5 and Supplementary Figures S5 and S6. As stated in our response to Reviewer #1, in our revisions, we will more clearly state the average global ancestry of the various components in the Main Text.

      I am unfortunately quite critical of the selection analysis, both on a statistical level and, more importantly, I do not believe it measures what the authors think it does.

      Major comments:

      (1) Admixture timing/genomic scaling/localization:

      As the authors lay out, the Faroes were likely colonized in the last 1,000-1,500 years, i.e., 40-60 generations ago. That means most genomic processes that have happened on the Faroese should have signatures that are on the order of ~1-2cM, whereas more local patterns likely indicate genetic history predating the colonization of the islands. Yet, the paper seems to be oblivious to this (to me) fascinating and somewhat unique premise. Maybe this thought is wrong, but I think the authors miss a chance here to explain why the reader should care beyond the fact that the small populations might have high-frequency risk alleles and the Faroes are intrinsically interesting, but more importantly, it also makes me think it leads to some misinterpretations in the selection analysis

      See response to point #3

      (2) ROH:

      Would the sampling scheme impact ROH? How would it deal with individuals with known parental coancestry? As an example of what I mean by my previous comment, 1MB is short enough in that I would expect most/many 1MB ROH-tracts to come from pedigree loops predating the colonization of the Faroes. (i.e, I am actually quite surprised that there isn't much more long ROH, which makes me wonder if that would be impacted by the sampling scheme).

      The sampling scheme was designed to choose 40 Faroese individuals that were representative of the different regions and were minimally related. There were no pairs of third-degree relatives or closer (pi-hat > 0.125) in either the Faroese cohort or the reference populations. It is possible that this sampling scheme would reduce the amount of longer ROHs in the population, but we should still be able to see overall patterns of ROH reflective of bottlenecks in the past tens of generations. Additionally, based on this reviewer's earlier comment, 1 Mb ROHs would still be relevant to demographic events in the last 40-60 generations given that on average 1 cM corresponds to 1 Mb in humans, though we recognize that is not an exact conversion.

      That said, the “sum total amount of the genome contained in long ROH” as we described in the manuscript includes all ROHs greater than 1Mb. Although we group all ROHs longer than 1Mb into one category in the current manuscript, we can look more specifically at the distribution of the longer ROH in future revisions and add discussion into what this might tell us about the timing of bottlenecks. 

      For now, we share a plot of the distribution in ROH lengths across all individuals for each cohort. As this plot shows, there certainly are ROHs longer than 1Mb in the Faroese cohort, and on average there is a higher proportion of long ROH particularly in the 5-15 Mb range in the Faroese cohort relative to the other cohorts.

      Author response image 1.

      (3) Selection scan:

      We are talking about a bottlenecked population that is recently admixed (Faroese), compared to a population (GBR) putatively more closely related to one of its sources. My guess would be that selection in such a scenario would be possibly very hard to detect, and even then, selection signals might not differentiate selection in Faroese vs. GBR, but rather selection/allele frequency differences between different source populations. I think it would be good to spell out why XP-EHH/iHS measures selection at the correct time scale, and how/if these statistics are expected to behave differently in an admixed population.

      The reviewer brings up good points about the utility of classical selection statistics in populations that are admixed or bottlenecked, and whether the timescale at which these statistics detect selection is relevant for understanding the selective history of the Faroese population. We break down these concerns separately.

      (1) Bottlenecks: Recent bottlenecks result in higher LD within a population. However, demographic events such as bottlenecks affect global genomic patterns while positive selection is expected to affect local genomic patterns. For this reason, iHS and XP-EHH statistics are standardized against the genome-wide background, to account for population-specific demographic history.

      (2) Admixture: The term “admixture” has different interpretations depending on the line of inquiry and the populations being studied. Across various time and geographic scales, all human populations are admixed to some degree, as gene flow between groups is a common fixture throughout our history. For example,

      even the modern British population has “admixed” ancestry from North / West European sources as well, dating to at least as recently as the Medieval & Viking periods (Gretzinger et al. 2022, Leslie et al. 2015), yet we do not commonly consider it an “admixed” population, and we are not typically concerned about applying haplotype-based statistics in this population. This is due to the low divergence between the source populations. In the case of the Faroe Islands, we believe admixture likely occurred on a similar timescale. We see low variance in ancestry proportions estimated by HaploNet, both from the historical Faroese individuals (250BP) and the modern samples. This indicates admixture predating the settlement of the Faroe Islands, where recombination has had time to break up long ancestry tracts and the global ancestry proportions have reached an equilibrium. That is, these ancestry patterns suggest that the modern Faroese are most likely descended from already admixed founders. We mention this as a likely possibility in the main text: “This could have occurred either via a mixture of the original “West Europe” ancestry with individuals of predominantly “North Europe” ancestry, or a by replacement with individuals that were already of mixed ancestry at the time of arrival in the islands (the latter are not uncommon in Viking Age mainland Europe).” And, as with the case of the British population, the closely-related ancestral sources for the Faroese founders were likely not so diverged as to have differences in allele frequencies and long-range haplotypes that would disrupt signals of selection from iHS or XP-EHH.

      (3) Time scale: It is certainly possible, and in fact likely, that iHS measures selection older than the settlement of the Faroe Islands. In our manuscript, we calculated iHS in both the Faroese and the closely related British cohort, and we highlight in the main Main Text that the top signals, with the exception of LCT, are shared between the two cohorts, indicative of selection that began prior to the population split. iHS is a commonly calculated statistic, and it is often calculated in a single population without comparing to others, so we feel it is important to show our result demonstrating these shared selection signals. In future revisions, we will emphasize in the main text that we are not claiming to have identified selection that occurred in the Faroese population post-settlement with the iHS statistic. As far as XP-EHH, it is a statistic designed to identify differentiated variants that are fixed or approaching fixation in one population but not others. The time-scale of selection that XP-EHH can detect would therefore be dependent on the populations used for comparison. As XP-EHH has the best power to identify alleles that are fixed or approaching fixation in one population but not others, it is less likely to detect older selection events / incomplete sweeps from the source populations.

      In our next revision, we will more clearly state limitations of these statistics under various population histories, and clarify the time-scale at which we are detecting selection for iHS vs XP-EHH.

      (4) Similarly, for the discussion of LCT, I am not convinced that the haplotypes depicted here are on the right scale to reflect processes happening on the Faroes. Given the admixture/population history, it at the very least should be discussed in the context of whether the 13910 allele frequency on the Faroes is at odds with what would be expected based on the admixture sources.

      We agree that more investigation into the LCT allele frequency in the other ancient samples may provide some insight into the selection history, particularly in light of ancient admixture. Please note, we did look at the allele frequency of the LCT allele rs4988235 and stated in the main text that it was present at high frequencies in the historical (250BP) Faroese samples. The frequency of this allele in the imputed historical Faroese samples is 82% while the allele is present at ~74% frequency in modern samples. We did not report the exact percentage in the main text because the sample size of the historical samples (11 individuals) is small and coverage of ancient samples is low, leading to potential errors in imputation. However, we can try to calculate the LCT allele frequency in other ancient samples, and assuming that we have good proxies for the sources at the time of admixture, we may calculate the expected allele frequency in the admixed ancestors of the Faroese founders in the next revision.

      (5) I am lacking information to evaluate the procedure for turning the outliers into p-values. Both iHS and XP-EHH are ratio statistics, meaning they might be heavy-tailed if one is not careful, and the central limit theorem may not apply. It would be much easier (and probably sufficient for the points being made here) to reframe this analysis in terms of empirical outliers.

      Given that there are disagreements on the best approach to reporting selection scan results from the reviewers, in our revision, we can additionally supply both the standardized iHS / XP-EHH values in the supplementary information as well as these values transformed to p-values. As the p-values are derived from the empirical distribution, the “significant” p-values are also empirical outliers from the empirical distribution, so the conclusions of the manuscript do not change. We found that the p-value approach and controlling for FDR is more conservative, with fewer signals reaching “significance” than are considered empirical outliers based on common approaches such as IQR or arbitrary percentile cutoffs.

      (6) Oldest individual predating gene flow: It seems impossible to make any statements based on a single individual. Why is it implausible that this person (or their parents), e.g., moved to the Faroes within their lifetime and died there?

      We agree with the reviewer that this is a plausible explanation, and in future revisions we will update the main text to acknowledge this possibility.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Mast cells have previously been reported to play an important role in bacterial immune defense and act protectively in sepsis. However, many of these findings were based on studies using Kit mutant mice. In this study, the authors conducted a detailed investigation using mast cell-deficient Cpa3 Cre-Master mice. As a result, the authors found that the Cpa3 Cre-Master mice exhibited responses similar to wild-type mice in terms of bacterial immune defense. This suggests that the observed phenotype is not due to mast cell-dependent bacterial immune defense, but rather is associated with dysbiosis of the gut microbiota.

      Strengths:

      Mast cells have long been reported to play an important role in the protective response against sepsis, and their function in infection defense has been demonstrated. However, Kit mutant mice have been reported to exhibit impaired peristalsis, and several mast cell-specific genetically modified mouse lines have since been developed and examined in detail. This study presents an important finding by logically demonstrating that the exacerbation of sepsis in Kit mice is due to alterations in the gut microbiota, and that the phenotype previously thought to be mast cell-dependent was, in fact, not.

      In addition, the experiments were carefully designed using mice with matched genetic backgrounds. These findings underscore the importance of microbiota composition in interpreting immune phenotypes and highlight the need for co-housing controls in mutant mouse studies.

      A major strength of this work is the robustness of the CLP data, generated over eight years by three independent researchers across two institutions with large sample sizes, lending strong support to the conclusions.

      Weaknesses:

      The study assesses only a limited subset of gut bacterial species, leaving the extent to which E. coli expansion contributes to the observed phenotype unclear.

      We will add new data based on 16S rRNA sequencing to the revised version.

      Moreover, in the cohousing experiments, there is no evidence provided to confirm successful microbiota normalization between groups.

      We note that co-housing is a generally accepted method for microbiota equalization or conversion (Caruso et al., Cell Rep. 2019, Ridaura et al., Science 2013, and reviewed in Moore et al., Clin. Transl. Immunol. 2016). In any case, Kit<sup>W/Wv</sup> mutants were made resistant to CLP by co-housing. Similar microbiota sequencing results between groups,while useful, would again only be correlative.

      A more detailed analysis of the microbial composition would be necessary to strengthen the reliability of the findings.

      See above

      It is also important to note that Cpa3-deficient mice exhibit not only mast cell depletion but also defects in basophils and T cells. These additional immunological alterations may counterbalance one another, potentially masking phenotypic changes and complicating interpretation.

      Regarding basophils in Cpa3<sup>Cre</sup> mice, compared to wild-type mice, basophils are reduced to about 39% of normal (Feyerabend et al., Immunity 2011). In Kit<sup>W/Wv</sup> mice, compared to wild-type mice, basophils are reduced to about 11% of normal. To our knowlegde, there has been no phenotype reported in which a reduction in basophils compensates for the loss for mast cells. Given that Kit<sup>W/Wv</sup> mice have about threefold lower numbers of basophils, and are highly susceptible to sepsis, there is no evidence that a reduction in basophils is protective in mast cell-deficient mice. On the contrary, mice that were normal for mast cells but had their basophils depleted were more susceptible to sepsis (Piliponsky et al., Nat. Immunol. 2019). Hence, basophils appear to be protective, and their reduction increases susceptibility. In light of these data and considerations, there is no evidence for a reduction in basophils to counterbalance the loss of mast cells in Cpa3<sup>Cre</sup> mice.

      Regarding T cells, there is no evidence, and there are no reports, that Cpa3<sup>Cre</sup> mice have defects in T cells (Feyerabend et al., Immunity 2011, Feyerabend et al., Cell Metabolism 2016). Cpa3 is weakly and transiently expressed early in the T cell lineage (Feyerabend et al., Immunity 2009; for expression levels in T cells versus mast cells, see below figure from the Immgen Database). In summary, in contrast to the reviewer's claim, there are no known defects in T cell development or functions in Cpa3<sup>Cre</sup> mice.

      Author response image 1.

      Generated from the Immgen database. Shown are RNAseq gene expression levels of diverse T-cell and mast cell populations.

      Furthermore, it remains to be determined whether the altered gut microbiota observed in Kit<sup>W/Wv</sup> mice is a consequence of impaired intestinal motility, whether a similar phenotype is observed in KitW-sh/W-sh mice, and whether comparable results occur in SCF-deficient models. Addressing these questions would provide greater clarity on the contribution of mast cells versus secondary factors in the observed phenotypes.

      Mice without mast cells (Cpa3<sup>Cre</sup> mice) are as resistant to sepsis as wild-type mice. Hence, mast cells are not involved in the immunity against sepsis, and 'secondary factors' are not involved in this simple experiment (both groups of mice, wild type and Cpa3<sup>Cre</sup> mice, were on the idential genetic background). Second, Kit<sup>W/Wv</sup> mice are also as resistant to sepsis as wild-type mice when confronted with the identical intestinal slurry. Therefore, Kit<sup>W/Wv</sup> mice have no immune deficit in response to sepsis. Hence, in our view, the underlying immunological question regarding the role of mast cells in sepsis has been conclusively addressed by our data. Future studies may address the mechanism that causes dysbiosis in Kit<sup>W/Wv</sup> mice, and other Kit mutants and steel mutants could be examined as well. These questions are, however, unrelated to the role of mast cells in sepsis, or the response of Kit<sup>W/Wv</sup> mice to sepsis, and would therefore not affect the central conclusion of our manuscript ("Susceptibility of Kit-mutant mice to sepsis caused by enteral dysbiosis, not mast cell deficiency").

      Given that Kit<sup>W/Wv</sup> mice exhibit impaired peristalsis, is the observed increase in E. coli a consequence of this dysfunction?

      See above

      Previous studies with BMMC reconstitution experiments have indicated that mast cells are a source of TNF - how does this align with the current findings?

      It is possible that cultured and transplanted mast cells (BMMC) produce TNF. Given that we did not find a reduction in TNF levels in the peritoneal lavage or serum in mice without mast cells undergoing sepsis, under physiological conditions mast cell-derived TNF does not seem to have a measuable impact on total TNF levels.

      Reviewer #2 (Public review):

      Summary:

      This study presents a useful finding that the high susceptibility to CLP sepsis of Kit-mutant mice is not due to mast cell deficiency, but to dysbiosis.

      However, the present data are insufficient and incomplete to support the conclusion, and would benefit from more rigorous approaches. With the mechanism part strengthened, this paper would be of interest to researchers on mast cell biology and mucosal immunology.

      We disagree with this view that our data are insufficient and incomplete. Our results demonstrate that mice lacking mast cells (Cpa3<sup>Cre</sup> mice) are as resistant to sepsis as wild-type mice, indicating that mast cells do not play a detectable role in immunity against sepsis. Additionally, we show that Kit<sup>W/Wv</sup> mice exhibit the same resistance to sepsis as wild-type mice when confronted with the identical intestinal slurry. This finding demonstrates that Kit<sup>W/Wv</sup> mice have no immune deficit in response to sepsis. These central data are both sufficient and complete, given that our data fully address the immunological questions regarding the role of mast cells in sepsis. Our study aimed to investigate the role of mast cells in sepsis, not to examine the mechanisms of dysbiosis or associated pathological phenotypes in Kit mutant controls.

      Recommendations:

      (1) The authors showed that E. coli increases in the cecum of Kit-mutant mice, which causes high CLP susceptibility. However, they did not provide any evidence E. coli is responsible for the high susceptibility.

      We showed that E. coli CFUs were increased in the cecum of Kit-mutant mice, but we did not state that this causes CLP susceptibility. We wrote: 'Hence, Kit<sup>W/Wv</sup> microbiota contains high levels of E. coli, which may underlie the observed pathogenicity'. We demonstrated that intestinal slurry from Kit<sup>W/Wv</sup> mice is more pathogenic compared to intestinal slurry from wild-type mice. However, we did not search for, or identify the bacterial species that causes this increased pathogenicity because we were adressing the role of mast cell in sepsis. 

      In the Figure 3 experiments, the authors administered the same number of cecal bacteria and did not show the number of E. coli after the administration.

      The samples were split and one aliquot was analysed by microbiology and the other aliquot was injected intraperitoneally. Fig. 3d shows the colony forming units (for Lactobacilli and E coli) from aliquots of cecal slurry used in the intraperitoneal injection experiments shown in Fig. 3a-c. Hence, our data show the colony forming units that were injected into the mice. It is unclear to us why this is not the key information rather than 'the number of E. coli after the administration'.

      The authors should provide evidence showing that depletion of E. coli decreases susceptibility.

      See response to point 1 above.

      (2) The author should provide direct evidence of dysbiosis by, for example, shotgun sequencing of cecal and fecal contents.

      The large increase in E coli counts in Kit<sup>W/Wv</sup> is evidence of dysbiosis. To obtain data beyond classical microbiology, we also performed 16S rRNA sequencing which will be included in the revision.

      (3) In case the authors find dysbiosis, they should analyze the mechanisms by which Kit mutation causes dysbiosis.

      The mechanism that causes dysbiosis in Kit<sup>W/Wv</sup> mice (which emerged from our work) belongs to other research areas that address the role of Kit in intestinal pathophysiology. These questions are unrelated to the role of mast cells in sepsis, or the response of Kit<sup>W/Wv</sup> mice to sepsis. Regardless of the results of such experiments, the conclusion ("Susceptibility of Kit-mutant mice to sepsis caused by enteral dysbiosis, not mast cell deficiency") remains unaffected. In brief, further explorations of pathological phenotypes of a control mutant will not add to the core message. Along these lines, the review process and the revision shall center on making the core of a paper as conclusive as possible, and not widen a paper by requests 'tangential to the main conclusion' (Kaelin Jr. Nature 2017).

      References

      Caruso, R., Ono, M., Bunker, M. E., Núñez, G. & Inohara, N. Dynamic and Asymmetric Changes of the Microbial Communities after Cohousing in Laboratory Mice. Cell Rep. 27, 3401-3412.e3 (2019).

      Feyerabend, T. B. et al. Deletion of Notch1 Converts Pro-T Cells to Dendritic Cells and Promotes Thymic B Cells by Cell-Extrinsic and Cell-Intrinsic Mechanisms. Immunity 30, 67–79 (2009).

      Feyerabend, T. B. et al. Cre-Mediated Cell Ablation Contests Mast Cell Contribution in Models of Antibody- and T Cell-Mediated Autoimmunity. Immunity 35, 832–844 (2011).

      Feyerabend, T. B., Gutierrez, D. A. & Rodewald, H.-R. Of Mouse Models of Mast Cell Deficiency and Metabolic Syndrome. Cell Metab 24, 1–2 (2016).

      Kaelin Jr, W. G. Publish houses of brick, not mansions of straw. Nature 545, 387–387 (2017).

      Moore, R. J. & Stanley, D. Experimental design considerations in microbiota/inflammation studies. Clin. Transl. Immunol. 5, e92 (2016).

      Piliponsky, A. M. et al. Basophil-derived tumor necrosis factor can enhance survival in a sepsis model in mice. Nat. Immunol. 20, 129–140 (2019).

      Ridaura, V. K. et al. Gut Microbiota from Twins Discordant for Obesity Modulate Metabolism in Mice. Science 341, 1241214 (2013).

    1. Author response:

      Reviewer #1 (Public review):

      Wang et al., recorded concurrent EEG-fMRI in 107 participants during nocturnal NREM sleep to investigate brain activity and connectivity related to slow oscillations (SO), sleep spindles, and in particular their co-occurrence. The authors found SO-spindle coupling to be correlated with increased thalamic and hippocampal activity, and with increased functional connectivity from the hippocampus to the thalamus and from the thalamus to the neocortex, especially the medial prefrontal cortex (mPFC). They concluded the brain-wide activation pattern to resemble episodic memory processing, but to be dissociated from task-related processing and suggest that the thalamus plays a crucial role in coordinating the hippocampal-cortical dialogue during sleep.

      The paper offers an impressively large and highly valuable dataset that provides the opportunity for gaining important new insights into the network substrate involved in SOs, spindles, and their coupling. However, the paper does unfortunately not exploit the full potential of this dataset with the analyses currently provided, and the interpretation of the results is often not backed up by the results presented. I have the following specific comments.

      Thank you for your thoughtful and constructive feedback. We greatly appreciate your recognition of the strengths of our dataset and findings Below, we address your specific comments and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We hope these revisions address your comments and further strengthen our manuscript. Thank you again for the constructive feedback.

      (1) The introduction is lacking sufficient review of the already existing literature on EEG-fMRI during sleep and the BOLD-correlates of slow oscillations and spindles in particular (Laufs et al., 2007; Schabus et al., 2007; Horovitz et al., 2008; Laufs, 2008; Czisch et al., 2009; Picchioni et al., 2010; Spoormaker et al., 2010; Caporro et al., 2011; Bergmann et al., 2012; Hale et al., 2016; Fogel et al., 2017; Moehlman et al., 2018; Ilhan-Bayrakci et al., 2022). The few studies mentioned are not discussed in terms of the methods used or insights gained.

      We acknowledge the need for a more comprehensive review of prior EEG-fMRI studies investigating BOLD correlates of slow oscillations and spindles. However, these articles are not all related to sleep SO or spindle. Articles (Hale et al., 2016; Horovitz et al., 2008; Laufs, 2008; Laufs, Walker, & Lund, 2007; Spoormaker et al., 2010) mainly focus on methodology for EEG-fMRI, sleep stages, or brain networks, which are not the focus of our study. Thank you again for your attention to the comprehensiveness of our literature review, and we will expand the introduction to include a more detailed discussion of the existing literature, ensuring that the contributions of previous EEG-fMRI sleep studies are adequately acknowledged.

      Introduction, Page 4 Lines 62-76

      “Investigating these sleep-related neural processes in humans is challenging because it requires tracking transient sleep rhythms while simultaneously assessing their widespread brain activation. Recent advances in simultaneous EEG-fMRI techniques provide a unique opportunity to explore these processes. EEG allows for precise event-based detection of neural signal, while fMRI provides insight into the broader spatial patterns of brain activation and functional connectivity (Horovitz et al., 2008; Huang et al., 2024; Laufs, 2008; Laufs, Walker, & Lund, 2007; Schabus et al., 2007; Spoormaker et al., 2010). Previous EEG-fMRI studies on sleep have focused on classifying sleep stages or examining the neural correlates of specific waves (Bergmann et al., 2012; Caporro et al., 2012; Czisch et al., 2009; Fogel et al., 2017; Hale et al., 2016; Ilhan-Bayrakcı et al., 2022; Moehlman et al., 2019; Picchioni et al., 2011). These studies have generally reported that slow oscillations are associated with widespread cortical and subcortical BOLD changes, whereas spindles elicit activation in the thalamus, as well as in several cortical and paralimbic regions. Although these findings provide valuable insights into the BOLD correlates of sleep rhythms, they often do not employ sophisticated temporal modeling (Huang et al., 2024), to capture the dynamic interactions between different oscillatory events, e.g., the coupling between SOs and spindles.”

      (2) The paper falls short in discussing the specific insights gained into the neurobiological substrate of the investigated slow oscillations, spindles, and their interactions. The validity of the inverse inference approach ("Open ended cognitive state decoding"), assuming certain cognitive functions to be related to these oscillations because of the brain regions/networks activated in temporal association with these events, is debatable at best. It is also unclear why eventually only episodic memory processing-like brain-wide activation is discussed further, despite the activity of 16 of 50 feature terms from the NeuroSynth v3 dataset were significant (episodic memory, declarative memory, working memory, task representation, language, learning, faces, visuospatial processing, category recognition, cognitive control, reading, cued attention, inhibition, and action).

      Thank you for pointing this out, particularly regarding the use of inverse inference approaches such as “open-ended cognitive state decoding.” Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7. We will refocus the main text on direct neurobiological insights gained from our EEG-fMRI analyses, particularly emphasizing the hippocampal-thalamocortical network dynamics underlying SO-spindle coupling, and we will acknowledge the exploratory nature of these findings and highlight their limitations.

      Discussion, Page 17-18 Lines 323-332

      “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”

      (3) Hippocampal activation during SO-spindles is stated as a main hypothesis of the paper - for good reasons - however, other regions (e.g., several cortical as well as thalamic) would be equally expected given the known origin of both oscillations and the existing sleep-EEG-fMRI literature. However, this focus on the hippocampus contrasts with the focus on investigating the key role of the thalamus instead in the Results section.

      We appreciate your insight regarding the relative emphasis on hippocampal and thalamic activation in our study. We recognize that the manuscript may currently present an inconsistency between our initial hypothesis and the main focus of the results. To address this concern, we will ensure that our Introduction and Discussion section explicitly discusses both regions, highlighting the complementary roles of the hippocampus (memory processing and reactivation) and the thalamus (spindle generation and cortico-hippocampal coordination) in SO-spindle dynamics.

      Introduction, Page 5 Lines 87-103

      “To address this gap, our study investigates brain-wide activation and functional connectivity patterns associated with SO-spindle coupling, and employs a cognitive state decoding approach (Margulies et al., 2016; Yarkoni et al., 2011)—albeit indirectly—to infer potential cognitive functions. In the current study, we used simultaneous EEG-fMRI recordings during nocturnal naps (detailed sleep staging results are provided in the Methods and Table S1) in 107 participants. Although directly detecting hippocampal ripples using scalp EEG or fMRI is challenging, we expected that hippocampal activation in fMRI would coincide with SO-spindle coupling detected by EEG, given that SOs, spindles, and ripples frequently co-occur during NREM sleep. We also anticipated a critical role of the thalamus, particularly thalamic spindles, in coordinating hippocampal-cortical communication.

      We found significant coupling between SOs and spindles during NREM sleep (N2/3), with spindle peaks occurring slightly before the SO peak. This coupling was associated with increased activation in both the thalamus and hippocampus, with functional connectivity patterns suggesting thalamic coordination of hippocampal-cortical communication. These findings highlight the key role of the thalamus in coordinating hippocampal-cortical interactions during human sleep and provide new insights into the neural mechanisms underlying sleep-dependent brain communication. A deeper understanding of these mechanisms may contribute to future neuromodulation approaches aimed at enhancing sleep-dependent cognitive function and treating sleep-related disorders.”

      Discussion, Page 16-17 Lines 292-307

      “When modeling the timing of these sleep rhythms in the fMRI, we observed hippocampal activation selectively during SO-spindle events. This suggests the possibility of triple coupling (SOs–spindles–ripples), even though our scalp EEG was not sufficiently sensitive to detect hippocampal ripples—key markers of memory replay (Buzsáki, 2015). Recent iEEG evidence indicates that ripples often co-occur with both spindles (Ngo, Fell, & Staresina, 2020) and SOs (Staresina et al., 2015; Staresina et al., 2023). Therefore, the hippocampal involvement during SO-spindle events in our study may reflect memory replay from the hippocampus, propagated via thalamic spindles to distributed cortical regions.

      The thalamus, known to generate spindles (Halassa et al., 2011), plays a key role in producing and coordinating sleep rhythms (Coulon, Budde, & Pape, 2012; Crunelli et al., 2018), while the hippocampus is found essential for memory consolidation (Buzsáki, 2015; Diba & Buzsá ki, 2007; Singh, Norman, & Schapiro, 2022). The increased hippocampal and thalamic activity, along with strengthened connectivity between these regions and the mPFC during SO-spindle events, underscores a hippocampal-thalamic-neocortical information flow. This aligns with recent findings suggesting the thalamus orchestrates neocortical oscillations during sleep (Schreiner et al., 2022). The thalamus and hippocampus thus appear central to memory consolidation during sleep, guiding information transfer to the neocortex, e.g., mPFC.”

      (4) The study included an impressive number of 107 subjects. It is surprising though that only 31 subjects had to be excluded under these difficult recording conditions, especially since no adaptation night was performed. Since only subjects were excluded who slept less than 10 min (or had excessive head movements) there are likely several datasets included with comparably short durations and only a small number of SOs and spindles and even less combined SO-spindle events. A comprehensive table should be provided (supplement) including for each subject (included and excluded) the duration of included NREM sleep, number of SOs, spindles, and SO+spindle events. Also, some descriptive statistics (mean/SD/range) would be helpful.

      We appreciate your recognition of our sample size and the challenges associated with simultaneous EEG-fMRI sleep recordings. We acknowledge the importance of transparently reporting individual subject data, particularly regarding sleep duration and the number of detected SOs, spindles, and SO-spindle events. To address this, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (5)Density of detected SOs; (6)Density of detected spindles; (7)Density of detected SO-spindle coupling events.

      However, most of the excluded participants were unable to fall asleep or had too short a sleep duration, so they basically had no NREM sleep period, so it was impossible to count the NREM sleep duration, SO, spindle, and coupling numbers.

      Supplementary Materials, Page 42-54, Table S1-S4

      (Consider of the length, we do not list all the tables here. Please refer to the revised manuscript.)

      (5) Was the 20-channel head coil dedicated for EEG-fMRI measurements? How were the electrode cables guided through/out of the head coil? Usually, the 64-channel head coil is used for EEG-fMRI measurements in a Siemens PRISMA 3T scanner, which has a cable duct at the back that allows to guide the cables straight out of the head coil (to minimize MR-related artifacts). The choice for the 20-channel head coil should be motivated. Photos of the recording setup would also be helpful.

      Thank you for your comment regarding our choice of the 20-channel head coil for EEG-fMRI measurements. We acknowledge that the 64-channel head coil is commonly used in Siemens PRISMA 3T scanners; however, the 20-channel coil was selected due to specific practical and technical considerations in our study. In particular, the 20-channel head coil was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil allowed us to maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.

      We have made this clearer in the revised manuscript.

      Methods, Page 20 Lines 385-392

      “All MRI data were acquired using a 20-channel head coil on a research-dedicated 3-Tesla Siemens Magnetom Prisma MRI scanner. Earplugs and cushions were provided for noise protection and head motion restriction. We chose the 20-channel head coil because it was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil helped maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.”

      (6) Was the EEG sampling synchronized to the MR scanner (gradient system) clock (the 10 MHz signal; not referring to the volume TTL triggers here)? This is a requirement for stable gradient artifact shape over time and thus accurate gradient noise removal.

      Thank you for raising this important point. We confirm that the EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This synchronization was achieved using the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift. As a result, the gradient artifact waveform remained stable across volumes, allowing for more effective artifact correction during preprocessing. We appreciate your attention to this critical aspect of EEG-fMRI data acquisition.

      We have made this clearer in the revised manuscript.

      Methods, Page 19-20 Lines 371-383

      “EEG was recorded simultaneously with fMRI data using an MR-compatible EEG amplifier system (BrainAmps MR-Plus, Brain Products, Germany), along with a specialized electrode cap. The recording was done using 64 channels in the international 10/20 system, with the reference channel positioned at FCz. In order to adhere to polysomnography (PSG) recording standards, six electrodes were removed from the EEG cap: one for electrocardiogram (ECG) recording, two for electrooculogram (EOG) recording, and three for electromyogram (EMG) recording. EEG data was recorded at a sample rate of 5000 Hz, the resistance of the reference and ground channels was kept below 10 kΩ, and the resistance of the other channels was kept below 20 kΩ. To synchronize the EEG and fMRI recordings, the BrainVision recording software (BrainProducts, Germany) was utilized to capture triggers from the MRI scanner. The EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This was achieved via the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift.”

      (7) The TR is quite long and the voxel size is quite large in comparison to state-of-the-art EPI sequences. What was the rationale behind choosing a sequence with relatively low temporal and spatial resolution?

      We acknowledge that our chosen TR and voxel size are relatively long and large compared to state-of-the-art EPI sequences. This decision was made to optimize the signal-to-noise ratio (SNR) and reduce susceptibility-related distortions, which are particularly critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. A longer TR allowed us to sample whole-brain activity with sufficient coverage, while a larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures such as the thalamus and hippocampus, which are key regions of interest in our study. We appreciate your concern and hope this clarification provides sufficient rationale for our sequence parameters.

      We have made this clearer in the revised manuscript.

      Methods, Page 20-21 Lines 398-408

      “Then, the “sleep” session began after the participants were instructed to try and fall asleep. For the functional scans, whole-brain images were acquired using k-space and steady-state T2*-weighted gradient echo-planar imaging (EPI) sequence that is sensitive to the BOLD contrast. This measures local magnetic changes caused by changes in blood oxygenation that accompany neural activity (sequence specification: 33 slices in interleaved ascending order, TR = 2000 ms, TE = 30 ms, voxel size = 3.5 × 3.5 × 4.2 mm<sup>3</sup>, FA = 90°, matrix = 64 × 64, gap = 0.7 mm). A relatively long TR and larger voxel size were chosen to optimize SNR and reduce susceptibility-related distortions, which are critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. The longer TR allowed whole-brain coverage with sufficient temporal resolution, while the larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures (e.g., the thalamus and hippocampus), which are key regions of interest in this study.”

      (8) The anatomically defined ROIs are quite large. It should be elaborated on how this might reduce sensitivity to sleep rhythm-specific activity within sub-regions, especially for the thalamus, which has distinct nuclei involved in sleep functions.

      We appreciate your insight regarding the use of anatomically defined ROIs and their potential limitations in detecting sleep rhythm-specific activity within sub-regions, particularly in the thalamus. Given the distinct functional roles of thalamic nuclei in sleep processes, we acknowledge that using a single, large thalamic ROI may reduce sensitivity to localized activity patterns. To address this, we will discuss this limitation in the revised manuscript, acknowledging that our approach prioritizes whole-structure effects but may not fully capture nucleus-specific contributions.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (9) The study reports SO & spindle amplitudes & densities, as well as SO+spindle coupling, to be larger during N2/3 sleep compared to N1 and REM sleep, which is trivial but can be seen as a sanity check of the data. However, the amount of SOs and spindles reported for N1 and REM sleep is concerning, as per definition there should be hardly any (if SOs or spindles occur in N1 it becomes by definition N2, and the interval between spindles has to be considerably large in REM to still be scored as such). Thus, on the one hand, the report of these comparisons takes too much space in the main manuscript as it is trivial, but on the other hand, it raises concerns about the validity of the scoring.

      We appreciate your concern regarding the reported presence of SOs and spindles in N1 and REM sleep and the potential implications. Our detection method for detecting SO, spindle, and coupling were originally designed only for N2&N3 sleep data based on the characteristics of the data itself, and this method is widely recognized and used in the sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). While, because the detection methods for SO and spindle are based on percentiles, this method will always detect a certain number of events when used for other stages (N1 and REM) sleep data, but the differences between these events and those detected in stage N23 remain unclear. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.

      Methods, Page 25 Lines 515-524

      “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”

      (10) Why was electrode F3 used to quantify the occurrence of SOs and spindles? Why not a midline frontal electrode like Fz (or a number of frontal electrodes for SOs) and Cz (or a number of centroparietal electrodes) for spindles to be closer to their maximum topography?

      We appreciate your suggestion regarding electrode selection for SO and spindle quantification. Our choice of F3 was primarily based on previous studies (Massimini et al., 2004; Molle et al., 2011), where bilateral frontal electrodes are commonly used for detecting SOs and spindles. Additionally, we considered the impact of MRI-related noise and, after a comprehensive evaluation, determined that F3 provided an optimal balance between signal quality and artifact minimization. We also acknowledge that alternative electrode choices, such as Fz for SOs and Cz for spindles, could provide additional insights into their topographical distributions.

      (11) Functional connectivity (hippocampus -> thalamus -> cortex (mPFC)) is reported to be increased during SO-spindle coupling and interpreted as evidence for coordination of hippocampo-neocortical communication likely by thalamic spindles. However, functional connectivity was only analysed during coupled SO+spindle events, not during isolated SOs or isolated spindles. Without the direct comparison of the connectivity patterns between these three events, it remains unclear whether this is specific for coupled SO+spindle events or rather associated with one or both of the other isolated events. The PPIs need to be conducted for those isolated events as well and compared statistically to the coupled events.

      We appreciate your critical perspective on our functional connectivity analysis and the interpretation of hippocampus-thalamus-cortex (mPFC) interactions during SO-spindle coupling. We acknowledge that, in the current analysis, functional connectivity was only examined during coupled SO-spindle events, without direct comparison to isolated SOs or isolated spindles. To address this concern, we have conducted PPI analyses for all three ROIs(Hippocampus, Thalamus, mPFC) and all three event types (SO-spindle couplings, isolated SOs, and isolated spindles). Our results indicate that neither isolated SOs nor isolated Spindles yielded significant connectivity changes in all three ROIs, as all failed to survive multiple comparison corrections. This suggests that the observed connectivity increase is specific to SO-spindle coupling, rather than being independently driven by either SOs or spindles alone.

      Results, Page 14 Lines 248-255

      “Crucially, the interaction between FC and SO-spindle coupling revealed that only the functional connectivity of hippocampus -> thalamus (ROI analysis, t<sub>(106)</sub> = 1.86, p = 0.0328) and thalamus -> mPFC (ROI analysis, t<sub>(106)</sub> = 1.98, p = 0.0251) significantly increased during SO-spindle coupling, with no significant changes in all other pathways (Fig. 4e). We also conducted PPI analyses for the other two events (SOs and spindles), and neither yielded significant connectivity changes in the three ROIs, as all failed to survive whole-brain FWE correction at the cluster level (p < 0.05). Together, these findings suggest that the thalamus, likely via spindles, coordinates hippocampal-cortical communication selectively during SO-spindle coupling, but not isolated SOs or spindle events alone.”

      (12) The limited temporal resolution of fMRI does indeed not allow for easily distinguishing between fMRI activation patterns related to SO-up- vs. SO-down-states. For this, one could try to extract the amplitudes of SO-up- and SO-down-states separately for each SO event and model them as two separate parametric modulators (with the risk of collinearity as they are likely correlated).

      We appreciate your insightful comment regarding the challenge of distinguishing fMRI activation patterns related to SO-up vs. SO-down states due to the limited temporal resolution of fMRI. While our current analysis does not differentiate between these two phases, we acknowledge that separately modeling SO-up and SO-down states using parametric modulators could provide a more refined understanding of their distinct neural correlates. However, as you notes, this approach carries the risk of collinearity, and there is indeed a high correlation between the two amplitudes across all subjects in our results (r=0.98). Future studies could explore more on leveraging high-temporal-resolution techniques. While implementing this in the current study is beyond our scope, we will acknowledge this limitation in the Discussion section.

      Discussion, Page 17 Lines 308-322

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.

      (13) L327: "It is likely that our findings of diminished DMN activity reflect brain activity during the SO DOWN-state, as this state consistently shows higher amplitude compared to the UP-state within subjects, which is why we modelled the SO trough as its onset in the fMRI analysis." This conclusion is not justified as the fact that SO down-states are larger in amplitude does not mean their impact on the BOLD response is larger.

      We appreciate your concern regarding our interpretation of diminished DMN activity reflecting the SO down-state. We acknowledge that the current expression is somewhat misleading, and our interpretation of it is: it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. And we will make this clear in the Discussion section.

      Discussion, Page 17 Lines 308-322

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.

      (14) Line 77: "In the current study, while directly capturing hippocampal ripples with scalp EEG or fMRI is difficult, we expect to observe hippocampal activation in fMRI whenever SOs-spindles coupling is detected by EEG, if SOs- spindles-ripples triple coupling occurs during human NREM sleep". Not all SO-spindle events are associated with ripples (Staresina et al., 2015), but hippocampal activation may also be expected based on the occurrence of spindles alone (Bergmann et al., 2012).

      We appreciate your clarification regarding the relationship between SO-spindle coupling and hippocampal ripples. We acknowledge that not all SO-spindle events are necessarily accompanied by ripples (Staresina et al., 2015). However, based on previous research, we found that hippocampal ripples are significantly more likely to occur during SO-spindle coupling events. This suggests that while ripple occurrence is not guaranteed, SO-spindle coupling creates a favorable network state for ripple generation and potential hippocampal activation. To ensure accuracy, we will revise the manuscript to delete this misleading sentence in the Introduction section and acknowledge in the Discussion that our results cannot conclusively directly observe the triple coupling of SO, spindle, and hippocampal ripples.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      Reviewer #2 (Public review):

      In this study, Wang and colleagues aimed to explore brain-wide activation patterns associated with NREM sleep oscillations, including slow oscillations (SOs), spindles, and SO-spindle coupling events. Their findings reveal that SO-spindle events corresponded with increased activation in both the thalamus and hippocampus. Additionally, they observed that SO-spindle coupling was linked to heightened functional connectivity from the hippocampus to the thalamus, and from the thalamus to the medial prefrontal cortex-three key regions involved in memory consolidation and episodic memory processes.

      This study's findings are timely and highly relevant to the field. The authors' extensive data collection, involving 107 participants sleeping in an fMRI while undergoing simultaneous EEG recording, deserves special recognition. If shared, this unique dataset could lead to further valuable insights. While the conclusions of the data seem overall well supported by the data, some aspects with regard to the detection of sleep oscillations need clarification.

      The authors report that coupled SO-spindle events were most frequent during NREM sleep (2.46 [plus minus] 0.06 events/min), but they also observed a surprisingly high occurrence of these events during N1 and REM sleep (2.23 [plus minus] 0.09 and 2.32 [plus minus] 0.09 events/min, respectively), where SO-spindle coupling would not typically be expected. Combined with the relatively modest SO amplitudes reported (~25 µV, whereas >75 µV would be expected when using mastoids as reference electrodes), this raises the possibility that the parameters used for event detection may not have been conservative enough - or that sleep staging was inaccurately performed. This issue could present a significant challenge, as the fMRI findings are largely dependent on the reliability of these detected events.

      Thank you very much for your thorough and encouraging review. We appreciate your recognition of the significance and relevance of our study and dataset, particularly in highlighting how simultaneous EEG-fMRI recordings can provide complementary insights into the temporal dynamics of neural oscillations and their associated spatial activation patterns during sleep. In the sections that follow, we address each of your comments in detail. We have revised the text and conducted additional analyses wherever possible to strengthen our argument, clarify our methodological choices. We believe these revisions improve the clarity and rigor of our work, and we thank you for helping us refine it.

      We appreciate your insightful comments regarding the detection of sleep oscillations. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.

      Regarding the reported SO amplitudes (~25 µV), during preprocessing, we applied the Signal Space Projection (SSP) method to more effectively remove MRI gradient artifacts and cardiac pulse noise. While this approach enhances data quality, it also reduces overall signal power, leading to systematically lower reported amplitudes. Despite this, our SO detection in NREM sleep (especially N2/N3) remain physiologically meaningful and are consistent with previous fMRI studies using similar artifact removal techniques. We appreciate your careful evaluation and valuable suggestions.

      In addition, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (2)Density of detected SOs; (3)Density of detected spindles; (4)Density of detected SO-spindle coupling events.

      Methods, Page 25 Lines 515-524

      “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”

      Supplementary Materials, Page 42-54, Table S1-S4

      (Consider of the length, we do not list all the tables here. Please refer to the revised manuscript.)

      Reviewer #3 (Public review):

      Summary:

      Wang et al., examined the brain activity patterns during sleep, especially when locked to those canonical sleep rhythms such as SO, spindle, and their coupling. Analyzing data from a large sample, the authors found significant coupling between spindles and SOs, particularly during the upstate of the SO. Moreover, the authors examined the patterns of whole-brain activity locked to these sleep rhythms. To understand the functional significance of these brain activities, the authors further conducted open-ended cognitive state decoding and found a variety of cognitive processing may be involved during SO-spindle coupling and during other sleep events. The authors next investigated the functional connectivity analyses and found enhanced connectivity between the hippocampus, the thalamus, and the medial PFC. These results reinforced the theoretical model of sleep-dependent memory consolidation, such that SO-spindle coupling is conducive to systems-level memory reactivation and consolidation.

      Strengths:

      There are obvious strengths in this work, including the large sample size, state-of-the-art neuroimaging and neural oscillation analyses, and the richness of results.

      Weaknesses:

      Despite these strengths and the insights gained, there are weaknesses in the design, the analyses, and inferences.

      Thank you for your detailed and thoughtful review of our manuscript. We are delighted that you recognize our advanced analysis methods and rich results of neuroimaging and neural oscillations as well as the large sample size data. In the following sections, we provide detailed responses to each of your comments. And we have revised the text and conducted additional analyses to strengthen our arguments and clarify our methodological choices. We believe these revisions enhance the clarity and rigor of our work, and we sincerely appreciate your thoughtful feedback in helping us refine the manuscript.

      (1) A repeating statement in the manuscript is that brain activity could indicate memory reactivation and thus consolidation. This is indeed a highly relevant question that could be informed by the current data/results. However, an inherent weakness of the design is that there is no memory task before and after sleep. Thus, it is difficult (if not impossible) to make a strong argument linking SO/spindle/coupling-locked brain activity with memory reactivation or consolidation.

      We appreciate your suggestion regarding the lack of a pre- and post-sleep memory task in our study design. We acknowledge that, in the absence of behavioral measures, it is hard to directly link SO-spindle coupling to memory consolidation in an outcome-driven manner. Our interpretation is instead based on the well-established role of these oscillations in memory processes, as demonstrated in previous studies. We sincerely appreciate this feedback and will adjust our Discussion accordingly to reflect a more precise interpretation of our findings.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (2) Relatedly, to understand the functional implications of the sleep rhythm-locked brain activity, the authors employed the "open-ended cognitive state decoding" method. While this method is interesting, it is rather indirect given that there were no behavioral indices in the manuscript. Thus, discussions based on these analyses are speculative at best. Please either tone down the language or find additional evidence to support these claims.

      Moreover, the results from this method are difficult to understand. Figure 3e showed that for all three types of sleep events (SO, spindle, SO-spindle), the same mental states (e.g., working memory, episodic memory, declarative memory) showed opposite directions of activation (left and right panels showed negative and positive activation, respectively). How to interpret these conflicting results? This ambiguity is also reflected by the term used: declarative memory and episodic memories are both indexed in the results. Yet these two processes can be largely overlapped. So which specific memory processes do these brain activity patterns reflect? The Discussion shall discuss these results and the limitations of this method.

      We appreciate your critical assessment of the open-ended cognitive state decoding method and its interpretational challenges. Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7.

      Due to the complexity of memory-related processes, we acknowledge that distinguishing between episodic and declarative memory based solely on this approach is not straightforward. We will revise the Supplementary Materials to explicitly discuss these limitations and clarify that our findings do not isolate specific cognitive processes but rather suggest general associations with memory-related networks.

      Discussion, Page 17-18 Lines 323-332

      “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potenial functional claims.”

      (3) The coupling strength is somehow inconsistent with prior results (Hahn et al., 2020, eLife, Helfrich et al., 2018, Neuron). Specifically, Helfrich et al. showed that among young adults, the spindle is coupled to the peak of the SO. Here, the authors reported that the spindles were coupled to down-to-up transitions of SO and before the SO peak. It is possible that participants' age may influence the coupling (see Helfrich et al., 2018). Please discuss the findings in the context of previous research on SO-spindle coupling.

      We appreciate your concern regarding the temporal characteristics of SO-spindle coupling. We acknowledge that the SO-spindle coupling phase results in our study are not identical to those reported by Hahn et al. (2020); Helfrich et al. (2018). However, these differences may arise due to slight variations in event detection parameters, which can influence the precise phase estimation of coupling. Notably, Hahn et al. (2020) also reported slight discrepancies in their group-level coupling phase results, highlighting that methodological differences can contribute to variability across studies. Furthermore, our findings are consistent with those of Schreiner et al. (2021), further supporting the robustness of our observations.

      That said, we acknowledge that our original description of SO-spindle coupling as occurring at the "transition from the lower state to the upper state" was not entirely precise. The -π/2 phase represents the true transition point, while our observed coupling phase is actually closer to the SO peak rather than strictly at the transition. We will revise this statement in the manuscript to ensure clarity and accuracy in describing the coupling phase.

      Discussion, Page 16 Lines 283-291

      “Our data provide insights into the neurobiological underpinnings of these sleep rhythms. SOs, originating mainly in neocortical areas such as the mPFC, alternate between DOWN- and UP-states. The thalamus generates sleep spindles, which in turn couple with SOs. Our finding that spindle peaks consistently occurred slightly before the UP-state peak of SOs (in 83 out of 107 participants), concurs with prior studies, including Schreiner et al. (2021). Yet it differs from some results suggesting spindles might peak right at the SO UP-state (Hahn et al., 2020; Helfrich et al., 2018). Such discrepancies could arise from differences in detection algorithms, participant age (Helfrich et al., 2018), or subtle variations in cortical-thalamic timing. Nonetheless, these results underscore the importance of coordinated SO-spindle interplay in supporting sleep-dependent processes.”

      (4) The discussion is rather superficial with only two pages, without delving into many important arguments regarding the possible functional significance of these results. For example, the author wrote, "This internal processing contrasts with the brain patterns associated with external tasks, such as working memory." Without any references to working memory, and without delineating why WM is considered as an external task even working memory operations can be internal. Similarly, for the interesting results on SO and reduced DMN activity, the authors wrote "The DMN is typically active during wakeful rest and is associated with self-referential processes like mind-wandering, daydreaming, and task representation (Yeshurun, Nguyen, & Hasson, 2021). Its reduced activity during SOs may signal a shift towards endogenous processes such as memory consolidation." This argument is flawed. DMN is active during self-referential processing and mind-wandering, i.e., when the brain shifts from external stimuli processing to internal mental processing. During sleep, endogenous memory reactivation and consolidation are also part of the internal mental processing given the lack of external environmental stimulation. So why during SO or during memory consolidation, the DMN activity would be reduced? Were there differences in DMN activity between SO and SO-spindle coupling events?

      We appreciate your concerns regarding the brevity of the discussion and the need for clearer theoretical arguments. We will expand this section to provide more in-depth interpretations of our findings in the context of prior literature. Regarding working memory (WM), we acknowledge that our phrasing was ambiguous. We will modify this statement in the Discussion section.

      For the SO-related reduction in DMN activity, we recognize the need for a more precise explanation. This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state.

      To address your final question, we have conducted the additional post hoc comparison of DMN activity between isolated SOs and SO-spindle coupling events. Our results indicate that

      DMN activation during SOs was significantly lower than during SO-spindle coupling (t<sub>(106)</sub> = -4.17, p < 1e-4). This suggests that SO-spindle coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. We appreciate your constructive feedback and will integrate these expanded analyses and discussions into our revised manuscript.

      Results, Page 11 Lines 199-208

      “Spindles were correlated with positive activation in the thalamus (ROI analysis, t<sub>(106)</sub> = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t<sub>(106)</sub> \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t<sub>(106)</sub> \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t<sub>(106)</sub> \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”

      Discussion, Page 17-18 Lines 308-332

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.

      To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”

      Reviewing Editor Comment:

      The reviewers think that you are working on a relevant and important topic. They are praising the large sample size used in the study. The reviewers are not all in line regarding the overall significance of the findings, but they all agree the paper would strongly benefit from some extra work, as all reviewers raise various critical points that need serious consideration.

      We appreciate your recognition of the relevance and importance of our study, as well as your acknowledgment of the large sample size as a strength of our work. We understand that there are differing perspectives regarding the overall significance of our findings, and we value the constructive critiques provided. We are committed to addressing the key concerns raised by all reviewers, including refining our analyses, clarifying our interpretations, and incorporating additional discussions to strengthen the manuscript. Below, we address your specific recommendations and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We believe that these revisions will significantly enhance the rigor and impact of our study, and we sincerely appreciate your thoughtful feedback in helping us improve our work.

      Reviewer #1 (Recommendations for the authors):

      (1) The phrase "overnight sleep" suggests an entire night, while these were rather "nocturnal naps". Please rephrase.

      Thank you for pointing this out. We have revised the phrasing in our manuscript to "nocturnal naps" instead of "overnight sleep" to more accurately reflect the duration of the sleep recordings.

      (2) Sleep staging results (macroscopic sleep architecture) should be provided in more detail (at least min and % of the different sleep stages, sleep onset latency, total sleep duration, total recording duration), at least mean/SD/range.

      Thank you for this suggestion. We will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics. This information will help provide a clearer overview of the macroscopic sleep architecture in our dataset.

      Supplementary Materials, Page 42, Table S1

      Author response table 1.

      Descriptive results of demographic information and sleep characteristics. Note: The total recorded time is equal to the awake time plus the total sleep time. The sleep onset latency is the time taken to reach the first sleep epoch. The Sleep Efficiency is the ratio of actual sleep time to total recording time.

      Reviewer #2 (Recommendations for the authors):

      In order to allow for a better estimation of the reliability of the detected sleep events, please:

      (1) Provide densities and absolute numbers of all detected SOs and spindles (N1, NREM, and REM sleep).

      Thank you for pointing this out. We will provide comprehensive tables in the supplementary materials, contains detailed information about sleep waves at each sleep stage for all 107 subjects (Table S2-S4), listing for each subject:1) Different sleep stage duration; 2) Number of detected SOs; 3) Number of detected spindles; 4) Number of detected SO-spindle coupling events; 5) Density of detected SOs; 6) Density of detected spindles; 7) Density of detected SO-spindle coupling events.

      Supplementary Materials, Page 43-54, Table S2-S4

      (Consider of the length, we do not list all the tables here. Please refer to the revised manuscript.)

      (2) Show ERPs for all detected SOs and spindles (per sleep stage).

      Thank you for the suggestion. We will provide ERPs for all detected SOs and spindles, separated by sleep stage (N1, N2&N3, and REM) in supplementary Fig. S2-S4. These ERP waveforms will help illustrate the characteristic temporal profiles of SOs and spindles across different sleep stages.

      Methods, Page 25, Line 525-532

      “Event-related potentials (ERP) analysis. After completing the detection of each sleep rhythm event, we performed ERP analyses for SOs, spindles, and coupling events in different sleep stages. Specifically, for SO events, we took the trough of the DOWN-state of each SO as the zero-time point, then extracted data in a [-2 s to 2 s] window from the broadband (0.1–30 Hz) EEG and used [-2 s to -0.5 s] for baseline correction; the results were then averaged across 107 subjects (see Fig. S2a). For spindle events, we used the peak of each spindle as the zero-time point and applied the same data extraction window and baseline correction before averaging across 107 subjects (see Fig. S2b). Finally, for SO-spindle coupling events, we followed the same procedure used for SO events (see Fig. 2a, Figs. S3–S4).”

      Supplementary Materials, Page 36-38, Fig. S2-S4

      Author response image 1.

      ERPs of SOs and spindles coupling during different sleep stages across all 107 subjects. a. ERP of SOs in different sleep stages using the broadband (0.1–30 Hz) EEG data. We align the trough of the DOWN-state of each SO at time zero (see Methods for details). The orange line represents the SO ERP in the N1 stage, the black line represents the SO ERP in the N2&N3 stage, and the green line represents the SO ERP in the REM stage. b. ERP of spindles in different sleep stages using the broadband (0.1–30 Hz) EEG data. We align the peak of each spindle at time zero (see Methods for details). The color scheme is the same as in panel a.

      Author response image 2.

      ERP and time-frequency patterns of SO-spindle coupling in the N1 stage. The averaged temporal frequency pattern and ERP across all instances of SO-spindle coupling, computed over all subjects, following the same procedure as in Fig. 2a, but for N1 stage.

      Author response image 3.

      ERP and time-frequency patterns of SO-spindle coupling in the REM stage. The averaged temporal frequency pattern and ERP across all instances of SO-spindle coupling, computed over all subjects, again following the same procedure as in Fig. 2a, but for REM stage.

      (3) Provide detailed info concerning sleep characteristics (time spent in each sleep stage etc.).

      Thank you for this suggestion. Same as the response above, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics.

      Supplementary Materials, Page 42, Table S1 (same as above)

      (4) What would happen if more stringent parameters were used for event detection? Would the authors still observe a significant number of SO spindles during N1 and REM? Would this affect the fMRI-related results?

      Thank you for this suggestion. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).

      Furthermore, in order to explore the impact of this on our fMRI results, we conducted an additional sensitivity analysis by applying different detection parameters for SOs. Specifically, we adjusted amplitude percentile thresholds for SO detection (the parameter that has the greatest impact on the results). We used the hippocampal activation value during N2&N3 stage SO-spindle coupling as an anchor value and found that when the parameters gradually became stricter, the results were similar to or even better than the current results. However, when we continued to increase the threshold, the results began to gradually decrease until the threshold was increased to 80%, and the results were no longer significant. This indicates that our results are robust within a specific range of parameters, but as the threshold increases, the number of trials decreases, ultimately weakening the statistical power of the fMRI analysis.

      Thank you again for your suggestions on sleep rhythm event detection. We will add the results in Supplementary and revise our manuscript accordingly.

      Results, Page 11, Line 199-208

      “Spindles were correlated with positive activation in the thalamus (ROI analysis, t<sub>(106)</sub> = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t<sub>(106)</sub> \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t<sub>(106)</sub> \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t<sub>(106)</sub> \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”

      Supplementary Materials, Page 40, Fig. S6

      Author response image 4.

      Influence of the percentile threshold for SO detection on hippocampal activation (ROI) during SO-spindle coupling. We changed the percentile threshold for SO event detection in the EEG data analysis and then reconstructed the GLM design matrix based on the SO events detected at each threshold. The brain-wide activation pattern of SO-spindle couplings in the N2/3 stage was extracted using the same method as shown in Fig. 3. The gray horizontal line represents the significant range (71%–80%). * p < 0.05.

      Finally, we sincerely thank all again for your thoughtful and constructive feedback. Your insights have been invaluable in refining our analyses, strengthening our interpretations, and improving the clarity and rigor of our manuscript. We appreciate the time and effort you have dedicated to reviewing our work, and we are grateful for the opportunity to enhance our study based on your recommendations.

      References:

      Bergmann, T. O., Mölle, M., Diedrichs, J., Born, J., & Siebner, H. R. (2012). Sleep spindle-related reactivation of category-specific cortical regions after learning face-scene associations. NeuroImage, 59(3), 2733-2742.

      Buzsáki, G. (2015). Hippocampal sharp wave‐ripple: A cognitive biomarker for episodic memory and planning. Hippocampus, 25(10), 1073-1188.

      Caporro, M., Haneef, Z., Yeh, H. J., Lenartowicz, A., Buttinelli, C., Parvizi, J., & Stern, J. M. (2012). Functional MRI of sleep spindles and K-complexes. Clinical neurophysiology, 123(2), 303-309.

      Coulon, P., Budde, T., & Pape, H.-C. (2012). The sleep relay—the role of the thalamus in central and decentral sleep regulation. Pflügers Archiv-European Journal of Physiology, 463, 53-71.

      Crunelli, V., Lőrincz, M. L., Connelly, W. M., David, F., Hughes, S. W., Lambert, R. C., Leresche, N., & Errington, A. C. (2018). Dual function of thalamic low-vigilance state oscillations: rhythm-regulation and plasticity. Nature Reviews Neuroscience, 19(2), 107-118.

      Czisch, M., Wehrle, R., Stiegler, A., Peters, H., Andrade, K., Holsboer, F., & Sämann, P. G. (2009). Acoustic oddball during NREM sleep: a combined EEG/fMRI study. PloS one, 4(8), e6749.

      Diba, K., & Buzsáki, G. (2007). Forward and reverse hippocampal place-cell sequences during ripples. Nature Neuroscience, 10(10), 1241.

      Diekelmann, S., & Born, J. (2010). The memory function of sleep. Nature Reviews Neuroscience, 11(2), 114-126.

      Fogel, S., Albouy, G., King, B. R., Lungu, O., Vien, C., Bore, A., Pinsard, B., Benali, H., Carrier, J., & Doyon, J. (2017). Reactivation or transformation? Motor memory consolidation associated with cerebral activation time-locked to sleep spindles. PloS one, 12(4), e0174755.

      Hahn, M. A., Heib, D., Schabus, M., Hoedlmoser, K., & Helfrich, R. F. (2020). Slow oscillation-spindle coupling predicts enhanced memory formation from childhood to adolescence. Elife, 9, e53730.

      Halassa, M. M., Siegle, J. H., Ritt, J. T., Ting, J. T., Feng, G., & Moore, C. I. (2011). Selective optical drive of thalamic reticular nucleus generates thalamic bursts and cortical spindles. Nature Neuroscience, 14(9), 1118-1120.

      Hale, J. R., White, T. P., Mayhew, S. D., Wilson, R. S., Rollings, D. T., Khalsa, S., Arvanitis, T. N., & Bagshaw, A. P. (2016). Altered thalamocortical and intra-thalamic functional connectivity during light sleep compared with wake. NeuroImage, 125, 657-667.

      Helfrich, R. F., Lendner, J. D., Mander, B. A., Guillen, H., Paff, M., Mnatsakanyan, L., Vadera, S., Walker, M. P., Lin, J. J., & Knight, R. T. (2019). Bidirectional prefrontal-hippocampal dynamics organize information transfer during sleep in humans. Nature Communications, 10(1), 3572.

      Helfrich, R. F., Mander, B. A., Jagust, W. J., Knight, R. T., & Walker, M. P. (2018). Old brains come uncoupled in sleep: slow wave-spindle synchrony, brain atrophy, and forgetting. Neuron, 97(1), 221-230. e224.

      Horovitz, S. G., Fukunaga, M., de Zwart, J. A., van Gelderen, P., Fulton, S. C., Balkin, T. J., & Duyn, J. H. (2008). Low frequency BOLD fluctuations during resting wakefulness and light sleep: A simultaneous EEG‐fMRI study. Human brain mapping, 29(6), 671-682.

      Huang, Q., Xiao, Z., Yu, Q., Luo, Y., Xu, J., Qu, Y., Dolan, R., Behrens, T., & Liu, Y. (2024). Replay-triggered brain-wide activation in humans. Nature Communications, 15(1), 7185.

      Ilhan-Bayrakcı, M., Cabral-Calderin, Y., Bergmann, T. O., Tüscher, O., & Stroh, A. (2022). Individual slow wave events give rise to macroscopic fMRI signatures and drive the strength of the BOLD signal in human resting-state EEG-fMRI recordings. Cerebral Cortex, 32(21), 4782-4796.

      Laufs, H. (2008). Endogenous brain oscillations and related networks detected by surface EEG‐combined fMRI. Human brain mapping, 29(7), 762-769.

      Laufs, H., Walker, M. C., & Lund, T. E. (2007). ‘Brain activation and hypothalamic functional connectivity during human non-rapid eye movement sleep: an EEG/fMRI study’—its limitations and an alternative approach. Brain, 130(7), e75.

      Margulies, D. S., Ghosh, S. S., Goulas, A., Falkiewicz, M., Huntenburg, J. M., Langs, G., Bezgin, G., Eickhoff, S. B., Castellanos, F. X., & Petrides, M. (2016). Situating the default-mode network along a principal gradient of macroscale cortical organization. Proceedings of the National Academy of Sciences, 113(44), 12574-12579.

      Massimini, M., Huber, R., Ferrarelli, F., Hill, S., & Tononi, G. (2004). The sleep slow oscillation as a traveling wave. Journal of Neuroscience, 24(31), 6862-6870.

      Moehlman, T. M., de Zwart, J. A., Chappel-Farley, M. G., Liu, X., McClain, I. B., Chang, C., Mandelkow, H., Özbay, P. S., Johnson, N. L., & Bieber, R. E. (2019). All-night functional magnetic resonance imaging sleep studies. Journal of neuroscience methods, 316, 83-98.

      Molle, M., Bergmann, T. O., Marshall, L., & Born, J. (2011). Fast and slow spindles during the sleep slow oscillation: disparate coalescence and engagement in memory processing. Sleep, 34(10), 1411-1421.

      Ngo, H.-V., Fell, J., & Staresina, B. (2020). Sleep spindles mediate hippocampal-neocortical coupling during long-duration ripples. Elife, 9, e57011.

      Picchioni, D., Horovitz, S. G., Fukunaga, M., Carr, W. S., Meltzer, J. A., Balkin, T. J., Duyn, J. H., & Braun, A. R. (2011). Infraslow EEG oscillations organize large-scale cortical– subcortical interactions during sleep: a combined EEG/fMRI study. Brain research, 1374, 63-72.

      Schabus, M., Dang-Vu, T. T., Albouy, G., Balteau, E., Boly, M., Carrier, J., Darsaud, A., Degueldre, C., Desseilles, M., & Gais, S. (2007). Hemodynamic cerebral correlates of sleep spindles during human non-rapid eye movement sleep. Proceedings of the National Academy of Sciences, 104(32), 13164-13169.

      Schreiner, T., Kaufmann, E., Noachtar, S., Mehrkens, J.-H., & Staudigl, T. (2022). The human thalamus orchestrates neocortical oscillations during NREM sleep. Nature communications, 13(1), 5231.

      Schreiner, T., Petzka, M., Staudigl, T., & Staresina, B. P. (2021). Endogenous memory reactivation during sleep in humans is clocked by slow oscillation-spindle complexes. Nature Communications, 12(1), 3112.

      Singh, D., Norman, K. A., & Schapiro, A. C. (2022). A model of autonomous interactions between hippocampus and neocortex driving sleep-dependent memory consolidation. Proceedings of the National Academy of Sciences, 119(44), e2123432119.

      Spoormaker, V. I., Schröter, M. S., Gleiser, P. M., Andrade, K. C., Dresler, M., Wehrle, R., Sämann, P. G., & Czisch, M. (2010). Development of a large-scale functional brain network during human non-rapid eye movement sleep. Journal of Neuroscience, 30(34), 11379-11387.

      Staresina, B. P., Bergmann, T. O., Bonnefond, M., van der Meij, R., Jensen, O., Deuker, L., Elger, C. E., Axmacher, N., & Fell, J. (2015). Hierarchical nesting of slow oscillations, spindles and ripples in the human hippocampus during sleep. Nature Neuroscience, 18(11), 1679-1686.

      Staresina, B. P., Niediek, J., Borger, V., Surges, R., & Mormann, F. (2023). How coupled slow oscillations, spindles and ripples coordinate neuronal processing and communication during human sleep. Nature Neuroscience, 1-9.

      Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C., & Wager, T. D. (2011). Large-scale automated synthesis of human functional neuroimaging data. Nature methods, 8(8), 665-670.

      Yeshurun, Y., Nguyen, M., & Hasson, U. (2021). The default mode network: where the idiosyncratic self meets the shared social world. Nature Reviews Neuroscience, 1-12.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors describe a massively parallel reporter assays (MPRA) screen focused on identifying polymorphisms in 5' and 3' UTRs that affect translation efficiency and thus might have a functional impact on cells. The topic is of timely interest, and indeed, several related efforts have recently been published and preprinted (e.g., https://pubmed.ncbi.nlm.nih.gov/37516102/ and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635273/). This study has several major issues with the results and their presentation.

      Major comments:

      (1) The main issue is that it appears that the screen has largely failed, yet the reasons for that are unclear, which makes it difficult to interpret. The authors start with a library that includes approximately 6,000 variants, which makes it a medium-sized MPRA. But then, only 483 pairs of WT/mutated UTRs yield high-confidence information, which is already a small number for any downstream statistical analysis, particularly since most don't actually affect translation in the reporter screen setting (which is not unexpected). It is unclear why >90% of the library did not give highconfidence information. The profiles presented as base-case examples in Figure 2B don't look very informative or convincing. All the subsequent analysis is done on a very small set of UTRs that have an effect, and it is unclear to this reviewer how these can yield statistically significant and/or biologically relevant associations.

      To make sure our final results are technically and statistically sound, we applied stringent selection criteria and cutoffs in our analytics workflow. First, from our RNA-seq dataset, we filtered the UTRs with at least 20 reads in a polysome profile across all three repeated experiments. Secondly, in the following main analysis using a negative binomial generalized linear model (GLM), we further excluded the UTRs that displayed batch effect, i.e. their batch-related main effect and interaction are significant. We believe our measure has safeguarded the filtered observations (UTRs) from the (potential) high variation of our massively parallel translation assays and thus gives high confidence to our results.

      Regarding the interpretation of Figure 2B, since we aimed to identify the UTRs whose interaction term of genotype and fractions is significant in our generalized linear model, it is statistically conventional to double-check the interaction of the two variables using such a graph. For instance, in the top left panel of Figure 2B (5'UTR of ANK2:c.-39G>T), we can see that read counts of WT samples congruously decreased from Mono to Light, whereas the read counts of mutant samples were roughly the same in the two fractions – the trend is different between WT and mutant. Ergo, the distinct distribution patterns of two genotypes across three fractions in Figure 2B offer the readers a convincing visual supplement to our statistics from GLM.

      In contrast to Figure 2B, the graphs of nonsignificant UTRs (shown below) reveal that the trends between the two genotypes are similar across the 'Mono and Light' and 'Light and Heavy' polysome fractions. Importantly, our analysis remains unaffected by differential expression levels between WT and mutant, as it specifically distinguishes polysome profiles with different distributions. This consistent trend further supports the lack of interaction between genotype and polysome fractions for these UTRs.

      Author response image 1.

      Figure: Examples of non-significant UTR pairs in massively parallel polysome profiling assays.

      (2) From the variants that had an effect, the authors go on to carry out some protein-level validations and see some changes, but it is not clear if those changes are in the same direction as observed in the screen.

      To infer the directionality of translation efficiency from polysome profiles, a common approach involves pooling polysome fractions and comparing them with free or monosome fractions to identify 'translating' fractions. However, this method has two major potential pitfalls: (i) it sacrifices resolution and does not account for potential bias toward light or heavy polysomes, and (ii) it fails to account for discrepancies between polysome load and actual protein output (as discussed in https://doi.org/10.1016/j.celrep.2024.114098 and https://doi.org/10.1038/s41598-019-47424-w). Therefore, our analysis focused on the changes within polysome profiles themselves. 'Significant' candidates were identified based on a significant interaction between genotype and polysome distribution using a negative binomial generalized linear model, without presupposing the direction of change on protein output. 

      (3) The authors follow up on specific motifs and specific RBPs predicted to bind them, but it is unclear how many of the hits in the screen actually have these motifs, or how significant motifs can arise from such a small sample size.

      We calculated the Δmotif enrichment in significant UTRs versus nonsignificant UTRs using Fisher’s exact test. For example, the enrichment of the Δ‘AGGG’ motif in 3’ UTRs is shown below:

      Author response table 1.

      This test yields a P-value of 0.004167 by Fisher’s exact test. The P-values and Odds ratios of Δmotifs in relation to polysome shifting are included in Supplementary Table S4, and we will update the detailed motif information in the revised Supplementary Table S4.

      (4) It is particularly puzzling how the authors can build a machine learning predictor with >3,000 features when the dataset they use for training the model has just a few dozens of translation-shifting variants.

      We understand the concern regarding the relatively small number of translation-shifting variants compared to the large number of features. To address this, we employed LASSO regression, which, according to The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman, is particularly suitable for datasets where the number of features 𝑝𝑝 is much larger than the number of samples 𝑁𝑁. LASSO effectively performs feature selection by shrinking less important coefficients to zero, allowing us to build a robust and generalizable model despite the limited number of variants.

      (5) The lack of meaningful validation experiments altering the SNPs in the endogenous loci by genome editing limits the impact of the results.

      We plan to assess the endogenous effect by generating CRISPR knock-in clones carrying the UTR variant.

      Reviewer #2 (Public Review):

      Summary:

      In their paper "Massively Parallel Polyribosome Profiling Reveals Translation Defects of Human Disease‐Relevant UTR Mutations" the authors use massively parallel polysome profiling to determine the effects of 5' and 3' UTR SNPs (from dbSNP/ClinVar) on translational output. They show that some UTR SNPs cause a change in the polysome profile with respect to the wild-type and that pathogenic SNPs are enriched in the polysome-shifting group. They validate that some changes in polysome profiles are predictive of differences in translational output using transiently expressed luciferase reporters. Additionally, they identify sequence motifs enriched in the polysome-shifting group. They show that 2 enriched 5' UTR motifs increase the translation of a luciferase reporter in a proteindependent manner, highlighting the use of their method to identify translational control elements.

      Strengths:

      This is a useful method and approach, as UTR variants have been more difficult to study than coding variants. Additionally, their evidence that pathogenic mutations are more likely to cause changes in polysome association is well supported.

      Weaknesses:

      The authors acknowledge that they "did not intend to immediately translate the altered polysome profile into an increase or decrease in translation efficiency, as the direction of the shift was not readily evident. Additionally, sedimentation in the sucrose gradient may have been partially affected by heavy particles other than ribosomes." However, shifted polysome distribution is used as a category for many downstream analyses. Without further clarity or subdivision, it is very difficult to interpret the results (for example in Figure 5A, is it surprising that the polysome shifting mutants decrease structure? Are the polysome "shifts" towards the untranslated or heavy fractions?)

      Our approach, combining polysome fractionation of the UTR library with negative binomial generalized linear model (GLM) analysis of RNA-seq data, systematically identifies variants that affect translational efficiency. The GLM model is specifically designed to detect UTR pairs with significant interactions between genotype and polysome fractions, relying solely on changes in polysome profiles to identify variants that disrupt translation. Consequently, our analytical method does not determine the direction of translation alteration.

      Following the massively parallel polysome profiling, we sought to understand how these polysomeshifting variants influence the translation process. To do this, we examined their effects on RNA characteristics related to translation, such as RBP binding and RNA structure. In Figure 5A, we observed a notable trend in significant hits within 5’ UTRs—they tend to increase ΔG (weaker folding energy) in response to changes in polysome profiles, regardless of whether protein production increases or decreases (Fig. 3).

    1. Author Response:

      Reviewer #1 (Public Review):

      Despite numerous studies on quinidine therapies for epilepsies associated with GOF mutant variants of Slack, there is no consensus on its utility due to contradictory results. In this study Yuan et al. investigated the role of different sodium selective ion channels on the sensitization of Slack to quinidine block. The study employed electrophysiological approaches, FRET studies, genetically modified proteins and biochemistry to demonstrate that Nav1.6 N- and C-tail interacts with Slack's C-terminus and significantly increases Slack sensitivity to quinidine blockade in vitro and in vivo. This finding inspired the authors to investigate whether they could rescue Slack GOF mutant variants by simply disrupting the interaction between Slack and Nav1.6. They find that the isolated C-terminus of Slack can reduce the current amplitude of Slack GOF mutant variants co-expressed with Nav1.6 in HEK cells and prevent Slack induced seizures in mouse models of epilepsy. This study adds to the growing list of channels that are modulated by protein-protein interactions, and is of great value for future therapeutic strategies.

      I have a few comments with regard to how Nav1.6 sensitize Slack to block by quinidine.

      (1) It is not clear to me if the Slack induced current amplitude varies depending on the specific Nav subtype. To this end, it would be valuable to test if Slack open probability is affected by the presence of specific Nav subtypes. Nav induced differences in Slack current amplitude and open probability could explain why individual Nav subtypes show varied ability to sensitize Slack to quinidine blockade.

      We appreciate the reviewer for raising this point. In order to address whether the whole-cell current amplitudes of Slack varies depending on the specific NaV subtype, we examined Slack current amplitudes upon co-expression of Slack with specific NaV subtypes in HEK293 cells. The results have shown that there are no significant differences in Slack current amplitudes upon co-expression of Slack with different NaV channel subtypes (Author response image 1), suggesting whole-cell Slack current amplitudes cannot explain the varied ability of NaV subtypes to sensitize Slack to quinidine blockade. To investigate the effect of different NaV channel subtypes on Slack open probability, we will perform the single-channel recordings in the future studies.

      Author response image 1.

      The amplitudes of Slack currents upon co-expression of Slack with specific NaV subtypes in HEK293 cells. ns, p > 0.05, one-way ANOVA followed by Bonferroni’s post hoc test.

      (2) It has previously been shown that INaP (persistent sodium current) is important for inducing Slack currents. Here the authors show that INaT (transient sodium current) of Nav1.6 is necessary for the sensitization of Slack to quinidine block whereas INaP surprisingly has no effect. The authors then show that the N-tail together with C-tail of Nav1.6 can induce same effect on Slack as full-length Nav1.6 in presence of high intracellular concentrations of sodium. However, it is not clear to me how the isolated N- and C-tail of Nav1.6 can induce sensitization of Slack to quinidine by interacting with C-terminus of Slack, while sensitization also is dependant on INaT. The authors speculate on different slack open conformation, but one could speculate if there is a missing link, such as an un-identified additional interacting protein that causes the coupling.

      We fully agree the importance of investigating the detailed mechanism underlying the sensitization of Slack to quinidine blockade mediated by the N- and C-termini of NaV1.6. Regarding the possibility of additional interacting proteins (“missing link”) that mediate the coupling between Slack and NaV1.6, our GST-pull down assays involving Slack and the N- and C-termini of NaV1.6 (Fig. S7) suggest a direct interaction between Slack and NaV1.6 channels. This finding leads us to consider the possibility of additional interacting proteins might be excluded. In order to further address these questions, we plan to employ structural biological methods, such as cryo-electron microscopy (cryo-EM).

      Reviewer #2 (Public Review):

      This is a very interesting paper about the coupling of Slack and Nav1.6 and the insight this brings to the effects of quinidine to treat some epilepsy syndromes.

      Slack is a sodium-activated potassium channel that is important to hyperpolarization of neurons after an action potential. Slack is encoded by KNCT1 which has mutations in some epilepsy syndromes. These types of epilepsy are treated with quinidine but this is an atypical antiseizure drug, not used for other types of epilepsy. For sufficient sodium to activate Slack, Slack needs to be close to a channel that allows robust sodium entry, like Na channels or AMPA receptors. but more mechanistic information is not available. Of particular interest to the authors is what allows quinidine to be effective in reducing Slack.

      In the manuscript, the authors show that Nav, not AMPA receptors are responsible for Slack activation, at least in cultured neurons (HeK293, primary cortical neurons). Most of the paper focuses on the evidence that Nav1.6 promotes Slack sensitivity to quinidine.

      (1) The paper is very well written although there are reservations about the use of non-neuronal cells or cultured primary neurons rather than a more intact system.

      We appreciate the reviewer's positive evaluation of our work. We acknowledge that utilizing a more intact system would provide valuable insights into the inhibitory effect of quinidine on Slack-NaV1.6. However, there are certain challenges associated with studying Slack currents in their entirety.

      First, in our experiments, isolating Slack currents from Na+-activated K+ currents in an intact system is challenging as selective inhibitors for Slick are currently unavailable. To address this, we propose using Slick gene knockout mice to specifically measure Slack currents under physiological conditions in the future investigations. Second, we have observed that the interaction between Slack and NaV1.6 primarily occurs at the axon initial segment of neurons. This poses a difficulty when using brain slices for measurements, as employing the whole-cell voltage-clamp technique to assess Slack at the axon initial segment may introduce systemic errors.

      We believe that testing the pharmacological effects of quinidine on Slack-NaV1.6 in primary neurons remains the optimal approach. Although non-neuronal cells or cultured primary neurons may not fully replicate the complexity of an intact system, they still provide valuable insights into the interactions between Slack and NaV1.6, and the effects of quinidine.

      (2) I also have questions about the figures.

      We will make the necessary modifications and clarifications based on the reviewer's comments:

      (3) Finally, riluzole is not a selective drug, so the limitations of this drug should be discussed.

      We thank the reviewer for raising this point. We will discuss the limitations of riluzole in our revised version of the manuscript.

      (4) On a minor point, the authors use the term in vivo but there are no in vivo experiments.

      We thanks the reviewer for raising this point. In our experiments, although we did not conduct experiments directly in living organisms, our results demonstrated the co-immunoprecipitation of NaV1.6 with Slack in homogenates from mouse cortical and hippocampal tissues (Fig. 3C). This result may support that the interaction between Slack and NaV1.6 occurs in vivo.

      Reviewer #3 (Public Review):

      Yuan et al., set out to examine the role of functional and structural interaction between Slack and NaVs on the Slack sensitivity to quinidine. Through pharmacological and genetic means they identify NaV1.6 as the privileged NaV isoform in sensitizing Slack to quinidine. Through biochemical assays, they then determine that the C-terminus of Slack physically interacts with the N- and C-termini of NaV1.6. Using the information gleaned from the in vitro experiments the authors then show that virally-mediated transduction of Slack's C-terminus lessens the extent of SlackG269S-induced seizures. These data uncover a previously unrecognized interaction between a sodium and a potassium channel, which contributes to the latter's sensitivity to quinidine.

      The conclusions of this paper are mostly well supported by data, but some aspects of functional and structural studies in vivo as well as physically interaction need to be clarified and extended.

      (1) Immunolabeling of the hippocampus CA1 suggests sodium channels as well as Slack colocalization with AnkG (Fig 3A). Proximity ligation assay for NaV1.6 and Slack or a super-resolution microscopy approach would be needed to increase confidence in the presented colocalization results. Furthermore, coimmunoprecipitation studies on the membrane fraction would bolster the functional relevance of NaV1.6-Slac interaction on the cell surface.

      We thank the reviewer for good suggestions. We acknowledge that employing proximity ligation assay and high-resolution techniques would significantly enhance our understanding of the localization of the Slack-NaV1.6 coupling.

      At present, the technical capabilities available in our laboratory and institution do not support high-resolution testing. However, we are enthusiastic about exploring potential collaborations to address these questions in the future. Furthermore, we fully recognize the importance of conducting co-immunoprecipitation (Co-IP) assays from membrane fractions. While we have already completed Co-IP assays for total protein and quantified the FRET efficiency values between Slack and NaV1.6 in the membrane region, the Co-IP assays on membrane fractions will be conducted in our future investigations.

      (2) Although hippocampal slices from Scn8a+/- were used for studies in Fig. S8, it is not clear whether Scn8a-/- or Scn8a+/- tissue was used in other studies (Fig 1J & 1K). It will be important to clarify whether genetic manipulation of NaV1.6 expression (Fig. 1K) has an impact on sodium-activated potassium current, level of surface Slack expression, or that of NaV1.6 near Slack.

      We thank the reviewer for pointing this out. In Fig. 1G,J,K, primary cortical neurons from homozygous NaV1.6 knockout (Scn8a-/-) mice were used. We will clarify this information in the revised manuscript. In terms of the effects of genetic manipulation of NaV1.6 expression on IKNa and surface Slack expression, we compared the amplitudes of IKNa measured from homozygous NaV1.6 knockout (NaV1.6-KO) neurons and wild-type (WT) neurons. The results showed that homozygous knockout of NaV1.6 does not alter the amplitudes of IKNa (Author response image 2). The level of surface Slack expression will be tested further.

      Author response image 2.

      The amplitudes of IKNa in WT and NaV1.6-KO neurons (data from manuscript Fig. 1K). ns, p > 0.05, unpaired two-tailed Student’s t test.

      (3) Did the epilepsy-related Slack mutations have an impact on NaV1.6-mediated sodium current?

      We thank the reviewer’s question. We examined the amplitudes of NaV1.6 sodium current upon expression alone or co-expression of NaV1.6 with epilepsy-related Slack mutations (K629N, R950Q, K985N). The results showed that the tested epilepsy-related Slack mutations do not alter the amplitudes of NaV1.6 sodium current (Author response image 3).

      Author response image 3.

      The amplitudes of NaV1.6 sodium currents upon co-expression of NaV1.6 with epilepsy-related Slack mutant variants (SlackK629N, SlackR950Q, and SlackK985N). ns, p>0.05, one-way ANOVA followed by Bonferroni’s post hoc test.

      4) Showing the impact of quinidine on persistent sodium current in neurons and on NaV1.6-expressing cells would further increase confidence in the role of persistent sodium current on sensitivity of Slack to quinidine.

      We appreciate the reviewer’s question. Previous studies have shown that quinidine can inhibit persistent sodium currents at low concentrations1. In our experiments, blocking persistent sodium currents by application of riluzole in the bath solution showed no significant effects on the sensitivity of Slack to quinidine blockade upon co-expression of Slack with NaV1.6 (Fig. 2F,H). This result suggested that persistent sodium currents were not involved in the sensitization of Slack to quinidine blockade.

      1. Ju YK, Saint DA, Gage PW. Effects of lignocaine and quinidine on the persistent sodium current in rat ventricular myocytes. Br J Pharmacol. Oct 1992; 107(2):311-6. doi:10.1111/j.1476-5381.1992.tb12743.x
    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The authors describe a method for gastruloid formation using mouse embryonic stem cells (mESCs) to study YS and AGM-like hematopoietic differentiation. They characterise the gastruloids during nine days of differentiation using a number of techniques including flow cytometry and single-cell RNA sequencing. They compare their findings to a published data set derived from E10-11.5 mouse AGM. At d9, gastruloids were transplanted under the adrenal gland capsule of immunocompromised mice to look for the development of cells capable of engrafting the mouse bone marrow. The authors then applied the gastruloid protocol to study overexpression of Mnx1 which causes infant AML in humans.

      In the introduction, the authors define their interpretation of the different waves of hematopoiesis that occur during development. 'The subsequent wave, known as definitive, produces: first, oligopotent erythro-myeloid progenitors (EMPs) in the YS (E8-E8.5); and later myelo-lymphoid progenitors (MLPs - E9.5-E10), multipotent progenitors (MPPs - E10-E11.5), and hematopoietic stem cells (HSCs - E10.5-E11.5), in the aorta-gonadmesonephros (AGM) region of the embryo proper.' Herein they designate the yolk sac-derived wave of EMP hematopoiesis as definitive, according to convention, although paradoxically it does not develop from intraembryonic mesoderm or give rise to HSCs.

      The apparent perplexity of the Reviewer with our definition of primitive and definitive waves is somewhat surprising, as it is widely used in the field (e.g. PMID: 18204427; PMID: 28299650; PMID: 33681211). Definitive haematopoiesis, encompassing EMP, MLP, MPP and HSC, highlights their origin from haemogenic hendothelium, generation of mature cells with adult characteristics from progenitors with multilineage potential and direct and indirect developmental contributions to the intra-embryonic and time-restricted generation of HSCs.

      General comments

      The authors make the following claims in the paper:

      (1) The development of a protocol for hemogenic gastruloids (hGx) that recapitulates YS and AGM-like waves of blood from HE.

      (2) The protocol recapitulates both YS and EMP-MPP embryonic blood development 'with spatial and temporal accuracy'.

      (3) The protocol generates HSC precursors capable of short-term engraftment in an adrenal niche.

      (4) Overexpression of MNX1 in hGx transforms YS EMP to 'recapitulate patient transcriptional signatures'.

      (5) hGx is a model to study normal and leukaemic embryonic hematopoiesis.

      There are major concerns with the manuscript. The statements and claims made by the authors are not supported by the data presented, data is overinterpreted, and the conclusions cannot be justified. Furthermore, the data is presented in a way that makes it difficult for the reader to follow the narrative, causing confusion. The authors have not discussed how their hGx compares to the previously published mouse embryoid body protocols used to model early development and hematopoiesis. the data is presented in a way that makes it difficult for the reader to follow the narrative, causing confusion. The authors have not discussed how their hGx compares to the previously published mouse embryoid body protocols used to model early development and hematopoiesis.

      Specific points

      (1) It is claimed that HGxs capture cellularity and topography of developmental blood formation. The hGx protocol described in the manuscript is a modification of a previously published gastruloid protocol (Rossi et al 2022). The rationale for the protocol modifications is not fully explained or justified. There is a lack of novelty in the presented protocol as the only modifications appear to be the inclusion of Activin A and an extension of the differentiation period from 7 to 9 days of culture. No direct comparison has been made between the two versions of gastruloid differentiation to justify the changes.

      The Reviewer paradoxically claims that the protocol is not novel and that it differs from a previous publication in at least 2 ways – the patterning pulse and the length of the protocol. Of these, the patterning pulse is key. As documented in Fig. S1, we cannot obtain Flk1-GFP expression in the absence of Activin A. Expression of Flk1 is a fundamental step in haemato-endothelial specification and, accordingly, we do not see CD41 or CD45+ cells in the absence of Activin A. Also, in our hands, there is a clear time-dependent progression of marker expression, with sequential acquisition of CD41 and CD45, with the latter not detectable until 192h (Fig. 1C-D), another key difference relative to the Rossi et al (2022) protocol. The 192h-timepoint, we argue in the manuscript, and present further evidence for in this rebuttal, corresponds to the onset of AGM-like haematopoiesis. We have empirically extended the protocol to maximise the CD45+ cell output (Fig. S1B-D).

      The inclusion of Activin A at high concentration at the beginning of differentiation would be expected to pattern endoderm rather than mesoderm. BMP signaling is required to induce Flk1+ mesoderm, even in the presence of Wnt.

      Again, we call the Reviewer’s attention to Fig. S1 which clearly shows that Activin A (with no BMP added) is required for induction of Flk1 expression, in the presence of Wnt. Activin A in combination with Wnt, is used in other protocols of haemato-endothelial differentiation from pluripotent cells, with no BMP added in the same step of patterning and differentiation (PMID: 39227582; PMID: 39223325). In the latter protocol, we also call the Reviewer’s attention to the fact that a higher concentration of Activin A precludes the need for BMP4 addition. Finally, one of us has recently reported that Activin A, on its own, will induce FLK1, as well as other anterior mesodermal progenitors (https://www.biorxiv.org/content/10.1101/2025.01.11.632562v1)..) In addressing the Reviewer’s concerns with the dose of Activin A used, we titrated its concentration against activation of Flk1, confirming optimal Flk1-GFP expression at the 100ng/ml dose used in the manuscript.

      Author response image 1.

      Dose-dependent requirement of Activin A for induction of Flk1 expression in haemogenic gastruloids. Composite GFP and brightfield live imaging of Flk1-GFP haemogenic gastruloids at 96h. Images were acquired using a Cytation5 instrument (Thermo). Images are representative of 12 gastruloids per condition.

      FACS analysis of the hGx during differentiation is needed to demonstrate the co-expression of Flk1-GFP and lineage markers such as CD34 to indicate patterning of endothelium from Flk1+ mesoderm. The FACS plots in

      Fig. 1 show c-Kit expression but very little VE-cadherin which suggests that CD34 is not induced. Early endoderm expresses c-Kit, CXCR4, and Epcam, but not CD34 which could account for the lack of vascular structures within the hGx as shown in Fig. 1E.

      We were surprised by the Reviewer’s comment that there are no endothelial structures in our gastruloids. The presence of a Flk1-GFP+ network is visible in the GFP images in Fig.1B, from 144h onwards, also shown in Author response image 2A. In addition, our single-cell RNA-seq data, included in the manuscript, confirms the presence of endothelial cells with a developing endothelial, including arterial, programme. This can be seen in Fig. 2B, F of the manuscript and is represented in Author response image 2B. In contrast with the Reviewer’s claims that no endothelial cells are formed, the data show that Kdr (Flk1)+ cells co-express Cdh5/VE-Cadherin and indeed Cd34, attesting to the presence of an endothelial programme. Arterial markers Efnb2, Flt1, and Dll4 are present. A full-blown programme, which also includes haemogenic markers including Sox17, Esam, Cd44 and Mecom is clear at early (144h) and, particularly at late (192h) timepoints in cells sorted on detection of surface c-Kit (Author response image 2B). Further to the data shown in B, already present in the manuscript, we also document co-expression of Flk1-GFP and CD34 by flow cytometry (Author response image 2C).

      Author response image 2.

      Haemogenic gastruloids have a branched vascular network. A. Whole-mount confocal imaging of 144h-haemogenic gastruloids. B. Differentiation of an arterial endothelial programme in haemogenic gastruloids; singlecell RNA-seq data of differentiating haemogenic gastruloids, sorted on cell surface expression of c-Kit at 144 and 192h; gene expression colour scale from yellow (low) to orange (high); grey = no detectable expression. C. Flow cytometry plots of 216h-haemogenic gastruloids showing detection of haemato-endothelial marker CD34.

      (2) The protocol has been incompletely characterised, and the authors have not shown how they can distinguish between either wave of Yolk Sac (YS) hematopoiesis (primitive erythroid/macrophage and erythro-myeloid EMP) or between YS and intraembryonic Aorta-Gonad-Mesonephros (AGM) hematopoiesis. No evidence of germ layer specification has been presented to confirm gastruloid formation, organisation, and functional ability to mimic early development. Furthermore, differentiation of YS primitive and YS EMP stages of development in vitro should result in the efficient generation of CD34+ endothelial and hematopoietic cells. There is no flow cytometry analysis showing the kinetics of CD34 cell generation during differentiation. Benchmarking the hGx against developing mouse YS and embryo data sets would be an important verification.

      The Reviewer is correct that we have not provided detailed characterisation of the different germ layers, as this was not the focus of the study. In that context, we were surprised by the earlier comment assuming co-expression of c-kit, Cxcr4 and Epcam, which we did not show, while overlooking the endothelial programme reiterated above, which we have presented.

      Given our focus on haemato-endothelial specification, we have started the single-cell RNA-seq characterisation of the haemogenic gastruloid at 120h and have not looked specifically at earlier timepoints of embryo patterning.

      This said, we show the presence of neuroectodermal cells in cluster 9; on the other hand, cluster 7 includes hepatoblast-like cells, denoting endodermal specification. We are happy to include this characterisation, to the extent that it is present, in a revised version of the manuscript. However, in the absence of earlier timepoints and given the bias towards mesodermal specification, we expect that specification of ectodermal and endodermal programmes may be incomplete.

      In respect of the contention regarding the capture of YS-like and AGM-like haematopoiesis, we have presented evidence in the manuscript that haemogenic cells generated during gastruloid differentiation, particularly at late 192h and 216h timepoints project onto highly purified c-Kit+ CD31+ Gfi1-expressing cells from mouse AGM (PMID: 38383534), providing support for the recapitulation of the corresponding developmental stage. In distinguishing between YS-like and AGM-like haematopoiesis, we call the Reviewer’s attention to the replotting of the single-cell RNA-seq data already in the manuscript, which we provided in response to point 1 (Author response image 2B), which highlights an increase in Sox17, but not Sox18, expression in the 192h haemogenic endothelium, which suggests an association with AGM haematopoiesis (PMID: 20228271). A significant association of Cd44 and Procr expression with the same time-point (Fig. 2F in the manuscript), further supports an AGM-like endothelial-to-haematopoietic transition at the 192h timepoint.

      Following on the Reviewer’s comments about CD34, we also inspected co-expression of CD34 with CD41 and CD45, the latter co-expression present in, although not necessarily exclusive to, AGM haematopoiesis.

      Reassuringly, we observed clear co-expression with both markers (Author response image 3), in addition to a CD41+CD34-population, which likely reflects YS EMP-independent erythropoiesis. Interestingly, marker expression is responsive to the levels of Activin A used in the patterning pulse, with the 100ng/ml Activin A used in our protocol superior to 75ng/ml.

      Author response image 3.

      Association of CD34 with CD41 and CD45 expression is Activin A-responsive and supports the presence of definitive haematopoiesis. A. Flow cytometry analysis of CD34 and CD41 expression in 216h-haemogenic gastruloids; two doses of Activin A were used in the patterning pulse with CHI99021 between 48-72h. FMO controls shown. B. Flow cytometry analysis of CD34 and CD45 at 216h in the same experimental conditions.

      We agree that it remains challenging to identify markers exclusive to AGM haematopoiesis, which is operationally equated with generation of transplantable haematopoietic stem cells. While HSC generation is a key event characteristic of the AGM, not all AGM haematopoiesis corresponds to HSCs, an important point in evaluating the data presented in the manuscript, and indeed acknowledged by us.

      Author response image 4.

      Clustering of haemogenic gastruloid cells sorted on the basis of haemato-endothelial surface markers CD41, C-Kit and CD45. A. Leiden clustering to single-cell RNA-seq data. B. Time stamps of sorted haemogenic gastruloid cells in A. C. Surface marker stamps of cells in A.

      Given the centrality of this point in comments by all the Reviewers, we have conducted projections of our single-cell RNA-seq data against two studies which (1) capture arterial and haemogenic specification in the para-splanchnopleura (pSP) and AGM region between E8.0 and E11 (Hou et al, PMID: 32203131), and (2) uniquely capture YS, AGM and FL progenitors and the AGM endothelial-to-haematopoietic transition (EHT) in the same scRNA-seq dataset (Zhu et al, PMID: 32392346).

      Focusing the analysis on the subsets of haemogenic gastruloid cells sorted as CD41+ (144h) CKit+ (144h and 192h) and CD45+ (192h and 216h) (Author response image 4AC), we show:

      (1) That a subset of haemato-endothelial cells from haemogenic gastruloids at 144h to 216h project onto intra-embryonic cells spanning E8.25 to E10 (Author response image 5A-B). This is in agreement with our interpretation that 216h are no later than the MPP/pre-HSC state of embryonic development, requiring further maturation to generate long-term engrafting HSC.

      (2) That haemogenic gastruloids contain YS-like (including EMP-like) and AGM-like haematopoietic cells (Author response image 6A-B). Significantly, some of the cells, particularly c-Kit-sorted cells with a candidate endothelial and HE-like signature project onto AGM pre-HE and HE, as well as IAHC, and later, predominantly 216h cells, have characteristics of MPP/LMPP-like cells from the FL.

      Altogether, the data support the notion that haemogenic gastruloids capture YS and AGM haematopoiesis until E10, as suggested by us in the manuscript. We thought it was important to share this preliminary data with the Editors at an early stage, and we will incorporate a deeper analysis in a revised version of the manuscript.

      Single-cell RNA sequencing was used to compare hGx with mouse AGM. The authors incorrectly conclude that ' ..specification of endothelial and HE cells in hGx follows with time-dependent developmental progression into putative AGM-like HE..' And, '...HE-projected hGx cells.......expressed Gata2 but not Runx1, Myb, or Gfi1b..' Hemogenic endothelium is defined by the expression of Runx1 and Gfli1b is downstream of Runx1.

      As a hierarchy of regulation, Gata2 precedes and drives Runx1 expression at the specification of HE (PMID: 17823307; PMID: 24297996), while Runx1 drives the EHT, upstream of Gfi1b in haematopoietic clusters (PMID: 34517413).

      Author response image 5.

      Projection of sorted haemogenic gastruloid cells onto Hou et al dataset (PMID: 32203131) analysing development of mouse intra-embryonic haematopoiesis. A. Time signatures of Hou et al data. B. Projection of Leiden clusters in Author response image 4A. Methodology as described in our manuscript; 68% gastruloid cells projected.

      Author response image 6.

      Projection of sorted haemogenic gastruloid cells onto Zhu et al dataset (PMID: 32392346), capturing arterial endothelial and haemogenic endothelial development, in reference to YS, AGM and FL haematopoietic progenitors. A. Functional cluster classification as per Zhu et al. B. Projection of Leiden clusters in Author response image 4A. Methodology as detailed in our manuscript; 58% gastruloid cells projected. Haematopoietic clusters annotated as in A.

      (3) The hGx protocol 'generates hematopoietic SC precursors capable of short-term engraftment' is not supported by the data presented. Short-term engraftment would be confirmed by flow cytometric detection of hematopoietic cells within the recipient bone marrow, spleen, thymus, and peripheral blood that expressed the BFP transgene. This analysis was not provided. PCR detection of transcripts, following an unspecified number of amplification cycles, as shown in Figure 3G (incorrectly referred to as Figure 3F in the legend) is not acceptable evidence for engraftment.

      We provide the full flow cytometry analysis of spleen engraftment in the 5 mice which received implantation of 216h-haemogenic gastruloids in the adrenal gland; an additional (control) animal received adrenal injection of PBS (Author response image 7). The animals were analysed at 4 weeks. In this experiment, the bone marrow collection was limiting, and material was prioritised for PCR.

      We had previously provided only representative plots of flow cytometry analysis of bone marrow and spleen in Fig. S4E, which we described as low-level engraftment. The analysis was complemented with genomic DNA PCR, where detection was present in only some of the replicates tested per animal. We confirm that PCR analysis used conventional 40 cycles; the sensitivity was shown in Fig. S4F. As shown in Fig. 3 A-C, no more than 7 CD45+CD144+ multipotent cells are present per haemogenic gastruloid, with 3 haemogenic gastruloids implanted in the adrenal gland of each transplanted animal. We argue that the low level of cytometric and molecular engraftment at 4 weeks, from haemogenic gastruloid-derived progenitors that have not progressed beyond a stage equivalent to E10 Author response image 5A-B) and that we have described as requiring additional maturation in vivo, are not surprising.

      Author response image 7.

      BFP engraftment of Nude recipient mice 4 weeks after unilateral adrenal implantation of 216h-haemogenic gastruloids. Flow cytometry analysis of spleen engraftment. Genomic PCR analysis is shown in Fig. 3G of the manuscript.

      Transplanted hGx formed teratoma-like structures, with hematopoietic cells present at the site of transplant only analysed histologically. Indeed, the quality of the images provided does not provide convincing validation that donor-derived hematopoietic cells were present in the grafts.

      As stated in the text, the images mean to illustrate that the haemogenic gastruloids developed in situ. The observation of donor-derived blood cells in the implanted haemogenic gastruloids would not correspond to engraftment, as we have amply demonstrated that they have generated blood cells in vitro. There is no evidence that there are remaining pluripotent cells in the haemogenic gastruloid after 9 days of differentiation, and it is therefore not clear that these are teratomas

      There is no justification for the authors' conclusion that '... the data suggest that 216h hGx generate AGM-like pre-HSC capable of at least short-term multilineage engraftment upon maturation...'. Indeed, this statement is in conflict with previous studies demonstrating that pre-HSCs in the dorsal aorta of the mouse embryo are immature and actually incapable of engraftment.

      We have clearly stated that we do not see haematopoietic engraftment through transplantation of dissociated haemogenic gastruloids, which reach the E10 state containing pre-HSC (Author response image 5). Instead, we observed rare myelo-erythroid (in the manuscript) and myelo-lymphoid (Author response image 9 below, in response to Reviewer 2) engraftment upon in vivo maturation of haemogenic gastruloids with preserved 3D organisation. These statements are not contradictory.

      The statement '...low-level production of engrafting cells recapitulates their rarity in vivo, in agreement with the embryo-like qualities of the gastruloid system....' is incorrect. Firstly, no evidence has been provided to show the hGx has formed a dorsal aorta facsimile capable of generating cells with engrafting capacity. Secondly, although engrafting cells are rare in the AGM, approximately one per embryo, they are capable of robust and extensive engraftment upon transplantation.

      We are happy to rephrase the statement to simply say that “…the data suggest that 216h haemogenic gastruloids contain candidate AGM-like progenitors with some short-term engraftment potential but incomplete functional maturation.” To be clear, with our existing statement we meant to highlight that the production of definitive AGM-like haematopoietic progenitors (not all of which are engrafting) in haemogenic gastruloids does not correspond to non-physiological single-lineage programming. We did not claim that we achieved production of HSC, which would be long-term engrafting.

      (4) Expression MNX1 transcript and protein in hematopoietic cells in MNX1 rearranged acute myeloid leukaemia (AML) is one cause of AML in infants. In the hGX model of this disease, Mnx1 is overexpressed in the mESCs that are used to form gastruloids. Mnx1 overexpression seems to confer an overall growth advantage on the hGx and increase the serial replating capacity of the small number of hematopoietic cells that are generated. The inefficiency with which the hGx model generates hematopoietic cells makes it difficult to model this disease. The poor quality of the cytospin images prevents accurate identification of cells. The statement that the kit-expressing cells represent leukemic blast cells is not sufficiently validated to support this conclusion. What other stem cell genes are expressed? Surface kit expression also marks mast cells, frequently seen in clonogenic assays of blood cells. Flow cytometric and gene expression analyses using known markers would be required.

      The haemogenic gastruloid model generates haematopoietic and haemato-endothelial cells. MNX1 expands Kit+ cells at 144h, which we show to have a haemato-endothelial signature (manuscript Fig. 2B, which we replotted in Author response image 2B).

      Serial replating of CFC assays is a conventional in vitro assay of leukaemia transformation. Critically, colony replating is not maintained in EV control cells, attesting to the transformation potential of MNX1.

      Although we have not fully-traced the cellular hierarchy of MNX1-driven transformation in the haemogenic gastruloid system, the in vitro replating expands a Kit+ cell (Fig. 5E), which reflects the surface phenotype of the leukaemia, also recapitulated in the mouse model initiated by MNX1-overexpressing FL cells. Importantly, it recapitulates the transcriptional profile of MNX1-leukaemia patients (Fig. 6C), which is uniquely expressed by MNX1144h and replated colony cells, but not to MNX1 216h gastruloid cells, arguing against a generic signature of MNX1 overexpression (Fig. 6B). Importantly, the MNX1-transformation of haemogenic gastruloid cells is superior to the FL leukaemia model at capturing the unique transcriptional features of MNX1-driven leukaemia, distinct from other forms of AML in the same age group (Fig S7). It is possible that this corresponds to a preleukaemia event, and we will explore this in future studies, which are beyond the proof-of-principle nature of this paper.

      (5) In human infant MNX1 AML, the mutation is thought to arise at the fetal liver stage of development. There is no evidence that this developmental stage is mimicked in the hGx model.

      We never claim that the haemogenic gastruloid model mimics the foetal liver. We propose that susceptibility to MNX1 is at the HE-to-EMP transition. Moreover, and importantly, contrary to the Reviewer’s statement, there is no evidence in the literature that the mutation arises in the foetal liver stage, just that the mutation arises before birth (PMID: 38806630), which is different. In a mouse model of MNX1 overexpression, the authors achieve leukaemia engraftment upon MNX1 overexpression in foetal liver, but not in bone marrow cells (PMID: 37317878). This is in agreement with a vulnerability of embryonic / foetal, but not adult cells to the MNX1 expression caused by the translocation. However, haematopoietic cells in the foetal liver originate from YS and AGM precursors, so the origin of the MNX1-susceptible cells can be in those locations, rather than the foetal liver itself.

      Reviewer #2 (Public review):<br /> Summary:<br /> In this manuscript, the authors develop an exciting new hemogenic gastruloid (hGX) system, which they claim reproduces the sequential generation of various blood cell types. The key advantage of this cellular system would be its potential to more accurately recapitulate the spatiotemporal emergence of hematopoietic progenitors within their physiological niche compared to other available in vitro systems. The authors present a large set of data and also validate their new system in the context of investigating infant leukemia.<br /> Strengths:<br /> The development of this new in vitro system for generating hematopoietic cells is innovative and addresses a significant drawback of current in vitro models. The authors present a substantial dataset to characterize this system, and they also validate its application in the context of investigating infant leukemia.<br /> Weaknesses:<br /> The thorough characterization and full demonstration that the cells produced truly represent distinct waves of hematopoietic progenitors are incomplete. The data presented to support the generation of late yolk sac (YS) progenitors, such as lymphoid cells, and aortic-gonad-mesonephros (AGM)-like progenitors, including pre-hematopoietic stem cells (pre-HSCs), by this system are not entirely convincing. Given that this is likely the manuscript's most crucial claim, it warrants further scrutiny and direct experimental validation. Ideally, the identity of these progenitors should be further demonstrated by directly assessing their ability to differentiate into lymphoid cells or fully functional HSCs. Instead, the authors primarily rely on scRNA-seq data and a very limited set of markers (e.g., Ikzf1 and Mllt3) to infer the identity and functionality of these cells. Many of these markers are shared among various types of blood progenitors, and only a well-defined combination of markers could offer some assurance of the lymphoid and pre-HSC nature of these cells, although this would still be limited in the absence of functional assays.<br /> The identification of a pre-HSC-like CD45⁺CD41⁻/lo c-Kit⁺VE-Cadherin⁺ cell population is presented as evidence supporting the generation of pre-HSCs by this system, but this claim is questionable. This FACS profile may also be present in progenitors generated in the yolk sac such as early erythro-myeloid progenitors (EMPs). It is only within the AGM context, and in conjunction with further functional assays demonstrating the ability of these cells to differentiate into HSCs and contribute to long-term repopulation, that this profile could be strongly associated with pre-HSCs. In the absence of such data, the cells exhibiting this profile in the current system cannot be conclusively identified as true pre-HSCs.

      At this preliminary response stage, we present 2 additional pieces of evidence to support our claims that we capture YS and AGM stages of haematopoietic development. In future experiments, we can complement these with functional assays, including co-culture with OP9 and OP9-DL stroma.

      Author response image 8.

      EZH2 inhibition affects CD41+ cellular output in haemogenic gastruloids at 144, but not 216h. A. Flow cytometry analysis of CD41 expression in 144h-haemogenic gastruloid treated with 0.5μM EZH2 inhibitor GSK126 from 120h. DMSO (0.05%), vehicle. 1 of 2 independent experiments (average CD41+: DMSO, 21.20%; GSK126, 12.10%; CD45 not detected). B. Flow cytometry analysis of CD41 and CD45 expression in 216h gastruloids, treated with DMSO or GSK216. (DMSO: average CD41+, 15.28%; average CD45+ 0.46%. GSK126: average CD41+, 23.78%; average CD45+, 2.08%).

      In Author response images 5 and 6, we project our single-cell RNA-seq data onto (1) developing intra-embryonic pSP and AGM between E8 and E11 (Author response image 5) and (2) a single-cell RNA-seq study of HE development which combines haemogenic and haematopoietic cells from the YS, the developing HE and IAHC in the AGM, and FL (Author response image 6). Our data maps E8.25-E10 (Author response image 5) and captures YS EMP and erythroid and myeloid progenitors, as well as AGM pre-HE, HE and IAHC, with some cells matching HSPC and LMPP (Author response image 6), as suggested by the projection onto the Thambyrajah et al data set (Fig. S3 in the manuscript).

      Given the difficulty in finding markers that specifically associate with AGM haematopoiesis, we inspected the possibility of capturing different regulatory requirements at different stages of gastruloid development mirroring differential effects in the embryo. Polycomb EZH2 is specifically required for EMP differentiation in the YS, but does not affect AGM-derived haematopoiesis; it is also not required for primitive erythroid cells (PMID: 29555646; PMID: 34857757). We treated haemogenic gastruloids from 120h onwards with either DMSO (0.05%) or GSK126 (0.5μM), and inspected the cellularity of gastruloids at 144h, which we equate with YS-EMP, and 216h – putatively AGM haematopoiesis (Author response image 8). We show that EZH2 inhibition / GSK126 treatment specifically reduces %CD41+ cells at 144h (Author response image 8A), but does not reduce %CD41+ or %CD45+ cells at 216h (Author response image 8B).

      Although preliminary, these data, together with the scRNA-seq projections described, provide evidence to our claim that 144h haemogenic gastruloids capture YS EMPs, while CD41+ and CD45+ cells isolated at 216h reflect AGM progenitors. We cannot conclude as to the functional nature of the AGM cells from this experiment.

      The engraftment data presented are also not fully convincing, as the observed repopulation is very limited and evaluated only at 4 weeks post-transplantation. The cells detected after 4 weeks could represent the progeny of EMPs that have been shown to provide transient repopulation rather than true HSCs.

      We clearly state that there is low level engraftment and do not claim to have generated HSC. We describe cells with short-term engraftment potential. Although the cells we show in the manuscript at 4 weeks could be EMPs (Author response image 7 and Fig. 3 and S3), we now have 8-week analysis of implant recipients, in which we observed, again low-level, engraftment of the recipient bone marrow in 1:3 animals (Author response image 9). This engraftment is myeloid-lymphoid and therefore likely to have originated in a later progenitor. To be clear, we do not claim that this corresponds to the presence of HSC. It nevertheless supports the maturation of progenitors with engraftment potential.

      Author response image 9.

      Flow cytometry BFP engraftment of recipient bone marrow 8-weeks post implantation of 216hhaemogenic gastruloids in the adrenal gland of Nude mice. 1:3 animals show BFP CD45+ engraftment in the myeloid (Mac1+) and B-lymphoid (B220+) lineages. 3 haemogenic gastruloids were implanted unilaterally in the adrenal gland of each animal. A. Engrafted animal, showing CD45+ BFP cells of myeloid (CD11b) and B-lymphoid affiliation (B220). B. Non-engrafted mouse recipient of haemogenic gastruloid implants.

      Reviewer #3 (Public review):<br /> In this study, the authors employ a mouse ES-derived "hemogenic gastruloid" model which they generated and which they claim to be able to deconvolute YS and AGM stages of blood production in vitro. This work could represent a valuable resource for the field. However, in general, I find the conclusions in this manuscript poorly supported by the data presented. Importantly, it isn't clear what exactly are the "YS" and the "AGM"-like stages identified in the culture and where is the data that backs up this claim. In my opinion, the data in this manuscript lack convincing evidence that can enable us to identify what kind of hematopoietic progenitor cells are generated in this system. Therefore, the statement that "our study has positioned the MNX1-OE target cell within the YS-EMP stage (line 540)" is not supported by the evidence presented in this study. Overall, the system seems to be very preliminary and requires further optimization before those claims can be made.<br /> Specific comments below:<br /> (1) The flow cytometric analysis of gastruloids presented in Figure 1 C-D is puzzling. There is a large % of c-Kit+ cells generated, but few VE-Cad+ Kit+ double positive cells. Similarly, there are many CD41+ cells, but very few CD45+ cells, which one would expect to appear toward the end of the differentiation process if blood cells are actually generated. It would be useful to present this analysis as consecutive gating (i.e. evaluating CD41 and CD45 within VE-Cad+ Kit+ cells, especially if the authors think that the presence of VE-Cad+ Kit+ cells is suggestive of EHT). The quantification presented in D is misleading as the scale of each graph is different.

      Fig. 1C-D provide an overview of haemogenic markers during the timecourse of haemogenic gastruloid differentiation, and does indeed show a late up-regulation of CD45, as the Reviewer points out would be expected. The %CD45+ cells is indeed low. However, we should point out that the haemogenic gastruloid protocol, although biased towards mesodermal outputs, does not aim to achieve pure haematopoietic specification, but rather place it in its embryo-like context. Consecutive gating at the 216h-timepoint is shown and quantified in Fig. 3A-B. We refute that the scale is misleading. It is a necessity to represent the data in a way that is interpretable by the reader: the gates (in C) are truly representative and annotated, as are the plot axes (in D).

      (2) The imaging presented in Figure 1E is very unconvincing. C-Kit and CD45 signals appear as speckles and not as membrane/cell surfaces as they should. This experiment should be repeated and nuclear stain (i.e. DAPI) should be included.

      We include the requested images below (Author response image 10).

      Author response image 10.

      Confocal images of haematopoietic production in haemogenic gastruloids. Wholemount, cleared haemogenic gastruloids were stained for CD45 (pseudo-coloured red) and c-Kit antigens (pseudo-coloured yellow) with indirect staining, as described in the manuscript. Flk1-GFP signal is shown in green. Nuclei are contrasted with DAPI. (A) 192h. (B) 216h.

      (3) Overall, I am not convinced that hematopoietic cells are consistently generated in these organoids. The authors should sort hematopoietic cells and perform May-Grunwald Giemsa stainings as they did in Figure 6 to confirm the nature of the blood cells generated.

      It is factual that the data are reproducible and complemented by functional assays shown in Fig. 3, which clearly demonstrate haematopoietic output. The single-cell RNA-seq data also show expression of a haematopoietic programme. Nevertheless, we include Giemsa-Wright’s stained cytospins obtained at 216h to illustrate haematopoietic output (Reviewer Fig. 11). Inevitably, the cytospins will be inconclusive as to the presence of endothelial-to-haematopoietic transition or the generation of haematopoietic stem/progenitor cells, as these cells do not have a distinctive morphology.

      Author response image 11.

      Cytospin of dissociated haemogenic gastruloids at 216h. Cytospins were stained with Giemsa-Wright’s stain and are visualised with a 40x objective. Annotated are cells in the monocytic (dashed open arrow), granulocytic (solid open arrow), megakaryocytic (solid arrow) and erythroid (asterisk) lineages; arrowheads indicate cells with a non-specific blast-like morphology. Representative image.

      (4) The scRNAseq in Figure 2 is very difficult to interpret. Specific points related to this:<br /> - Cluster annotation in Figure 2a is missing and should be included.<br /> - Why do the heatmaps show the expression of genes within sorted cells? Couldn't the authors show expression within clusters of hematopoietic cells as identified transcriptionally (which ones are they? See previous point)? Gene names are illegible.<br /> - I see no expression of Hlf or Myb in CD45+ cells (Figure 2G). Hlf is not expressed by any of the populations examined (panels E, F, G). This suggests no MPP or pre-HSC are generated in the culture, contrary to what is stated in lines 242-245. (PMID 31076455 and 34589491).<br /> Later on, it is again stated that "hGx cells... lacked detection of HSC genes like Hlf, Gfi1, or Hoxa9" (lines 281-283). To me, this is proof of the absence of AGM-like hematopoiesis generated in those gastruloids.

      Author response image 12.

      Expression of endothelial, haemogenic and haematopoietic genes in haemogenic gastruloid cells sorted at 144h, 192h and 216h. UMAP as in Author response image 4. Pecam (CD31) and CD34 represent endothelial genes also detected in haemogenic endothelium. CD44 is specifically enriched at the endothelial-to-haemogenic transition. Mecom is detected in haemogenic endothelium and haematopoietic progenitors. Mllt3 and Runx1 are haematopoietic markers. Hoxa9 and Hlf are associated with haematopoietic stem and progenitor cells and their detection is rare in haemogenic gastruloids at 216h.

      For a combination of logistic and technical reasons, we performed single-cell RNA-seq using the Smart-Seq2 platform, which is inherently low throughput. We overcame the issue of cell coverage by complementing whole-gastruloid transcriptional profiling at successive time-points with sorting of subpopulations of cells based on individual markers documented in Fig. 1. We clearly stated which platform was used as well as the number and type of cells profiled (Fig. S2A and lines 172-179 of the manuscript), and our approach is standard. We will review our representation of the data in a revised manuscript. Nevertheless, at this stage, we provide plots of the expression of key haematopoietic markers over UMAPs of haemogenic gastruloid timecourse (Author response image 12). We also show preliminary qRT-PCR data with increased Hlf expression upon extension of the protocol to 264h (Author response image 13), further confirming haematopoietic specification, including of candidate definitive progenitor cells, in the haemogenic gastruloid model.

      Author response image 13.

      Hlf expression is up-regulated in late stage haemogenic gastruloids. Quantitative RT-PCR analysis of Hlf expression in unfractionated haemogenic gastruloids cultured for 264h. From 168h onwards, haemogenic gastruloids were cultured in N2B27 in the presence of VEGF, SCF, FLT3L and TPO, all recombinant mouse cytokines, as described in the manuscript. Shown are mean±standard deviation of n=5 replicates from 2 mouse ES cell lines, respectively Flk1-GFP and Rosa26-BFP::Flk1-GFP, reported in the manuscript; 2-tailed unpaired t-test with Welch correction.

      (5) Mapping of scRNA-Seq data onto the dataset by Thambyrajah et al. is not proof of the generation of AGM HE. The dataset they are mapping to only contains AGM cells, therefore cells do not have the option to map onto something that is not AGM. The authors should try mapping to other publicly available datasets also including YS cells.

      We have done this and the data are presented in Author response image 5 and 6. As detailed in response to Reviewer 1, we have conducted projections of our single-cell RNA-seq data against two studies which (1) capture arterial and haemogenic specification in the para-splanchnopleura (pSP) and AGM region between E8.0 and E11 (Hou et al, PMID: 32203131) (Author response image 5), and (2) uniquely capture YS, AGM and FL progenitors and the AGM endothelial-to-haematopoietic transition (EHT) in the same scRNA-seq dataset (Zhu et al, PMID: 32392346) (Author response image 6). Specifically in answering the Reviewers’ point, we show that different subsets of haemogenic gastruloid cells sorted on haemogenic surface markers c-Kit, CD41 and CD45 cluster onto pre-HE and HE, intra-aortic clusters and FL progenitor compartments, and to YS EMP and erythroid and myeloid progenitors. This lends support to our claim that the haemogenic gastruloid system specifies both YS-like and AGM-like cells.

      (6) Conclusions in Figure 3, named "hGx specify cells with preHSC characteristics" are not supported by the data presented here. Again, I am not convinced that hematopoietic cells can be efficiently generated in this system, and certainly not HSCs or pre-HSCs.

      We have provided evidence, both in the manuscript and in this response to Reviewers, that there is haematopoietic specification, including of progenitor cells, in the haemogenic gastruloid system (Fig. 3 and Author response image 7,9). We have added data in this response that supports the specification of YS-like and AGM-like cells (Author response image 5-6, 8). Importantly, we have never claimed that haemogenic gastruloids generate HSC. We accept the Reviewer’s comment that we have not provided sufficient evidence for the specification of pre-HSC-like cells. We will re-phrase Fig. 3 conclusion as “Haemogenic gastruloids specify cells with characteristics of definitive haematopoietic progenitors”.

      - FACS analysis in 3A is again very unconvincing. I do not think the population identified as c-Kit+ CD144+ is real. Also, why not try gating the other way around, as commonly done (e.g. VE-Cad+ Kit+ and then CD41/CD45)?

      There is nothing unconventional about our gating strategy, which was done from a more populated gate onto the less abundant one to ensure that the results are numerically more robust. In the case of haemogenic gastruloids, unlike the AGM preparations the Reviewer may be referring to, CD41 and CD45+ cells are more abundant as there is no circulation of more differentiated haematopoietic cells away from the endothelial structures. This said, we did perform the gating as suggested (Author response image 14), indeed confirming that most VE-cad+ Kit+ cells are CD45+. Interestingly VE-cad+Kit- are predominantly CD41+, reinforcing the true haematopoietic nature of these cells.

      Author response image 14.

      Flow cytometry analysis of VE-cadherin+ cells in haemogenic gastruloids at 216h of the differentiation protocol, probing co-expression of CD45, CD41 and c-Kit.

      - The authors must have tried really hard, but the lack of short- or long-engraftment in a number of immunodeficient mouse models (lines 305-313) really suggests that no blood progenitors are generated in their system. I am not familiar with the adrenal gland transplant system, but it seems like a very non-physiological system for trying to assess the maturation of putative pre-HSCs. The data supporting the engraftment of these mice, essentially seen only by PCR and in some cases with a very low threshold for detection, are very weak, and again unconvincing. It is stated that "BFP engraftment of the Spl and BM by flow cytometry was very low level albeit consistently above control (Fig. S4E)" (lines 337-338). I do not think that two dots in a dot plot can be presented as evidence of engraftment.

      We have presented the data with full disclosure and do not deny that the engraftment achieved is low-level and short-term, indicating incomplete maturation of definitive haematopoietic progenitors in the current haemogenic gastruloid system. However, we call the Reviewer’s attention to the fact that detection of BFP+ cells by PCR and flow cytometry in the recipient animals at 4 weeks is consistent between the 2 methods (Author response image 7).

      Furthermore, we have now also been able to detect low-level myelo-lymphoid engraftment in the bone marrow 8 weeks after adrenal implantation, again suggesting the presence of a small number of definitive haematopoietic progenitors that potentially mature from the 3 haemogenic gastruloids implanted (Author response image 9).

      (7) Given the above, I find that the foundations needed for extracting meaningful data from the system when perturbed are very shaky at best. Nevertheless, the authors proceed to overexpress MNX1 by LV transduction, a system previously shown to transform fetal liver cells, mimicking the effect of the t(7;12) AML-associated translocation. Comments on this section:<br /> - The increase in the size of the organoid when MNX1 is expressed is a very unspecific finding and not necessarily an indication of any hematopoietic effect of MNX1 OE.

      We agree with the Reviewer on this point; it is nevertheless a reproducible observation which we thought relevant to describe for completeness and data reproducibility.

      - The mild increase of cKit+ cells (Figure 4E) at the 144hr timepoint and the lack of any changes in CD41+ or CD45+ cells suggests that the increase in Kit+ cells % is not due to any hematopoietic effect of MNX1 OE. No hematopoietic GO categories are seen in RNA seq analysis, which supports this interpretation. Could it be that just endothelial cells are being generated?

      The Reviewer is correct that the MNX1-overexpressing cells have a strong endothelial signature, which is present in the patients (Fig. 4A). We investigated a potential link with c-Kit by staining cells from the replating colonies during the process of in vitro transformation with CD31. We observed that 40-50% of c-Kit+ cells (20-30% total colony cells) co-expressed CD31(Author response image 15), at least at early plating. These cells co-exist with haematopoietic cells, namely Ter119+ cells, as expected from the YS-like erythroid and EMP-like affiliation of haematopoietic output from 144h-haemogenic gastruloids (Fig. 5F).

      Author response image 15.

      Endothelial affiliation of MNX1-oe replating cells from haemogenic gastruloid. A. Representative flow cytometry plot of plate 1 CFC from MNX1-overexpressing haemogenic gastruloids at 144h. B. Quantification of the proportion of CD31+c-Kit+ cells in plates 1 and 2 of MNX1-oe-driven in vitro transformation.

      (8) There seems to be a relatively convincing increase in replating potential upon MNX1-OE, but this experiment has been poorly characterized. What type of colonies are generated? What exactly is the "proportion of colony forming cells" in Figures 5B-D? The colony increase is accompanied by an increase in Kit+ cells; however, the flow cytometry analysis has not been quantified.

      Given the inability to replate control EV cells, there is not a population to compare with in terms of quantification. The level of c-Kit+ represented in Fig. 5E is achieved at plate 2 or 3 (depending on the experiment), both of which are significantly enriched for colony-forming cells relative to control (Fig. 5B, D).

      (9) Do hGx cells engraft upon MNX1-OE? This experiment, which appears not to have been performed, is essential to conclude that leukemic transformation has occurred.

      For the purpose of this study, we are satisfied with confirmation of in vitro transformation potential of MNX1 haemogenic gastruloids, which can be used for screening purposes. Although interesting, in vivo leukaemia engraftment from haemogenic gastruloids is beyond the scope of this study.

    1. Author response:

      We kindly thank the senior editor, the reviewing editor, and the esteemed reviewers for their invaluable insights in enhancing our manuscript. The assessment and feedback, particularly on the role of directly released bacterial ATP versus OMV-delivered bacterial ATP and its role on neutrophils, addressing study limitations, and discussing our models is highly appreciated.

      The points you raised let us critically rethink our approach, our results, and our conclusions. Furthermore, it gave us the chance to elaborate on some critical aspects that you mentioned. With your help, we will make clarifications throughout the manuscript, and we will add the data about neutrophil numbers in the different organs (reviewer #1, weaknesses #3).

      Reviewer #1 (Public Review):

      Summary:

      • Extracellular ATP represents a danger-associated molecular pattern associated to tissue damage and can act also in an autocrine fashion in macrophages to promote proinflammatory responses, as observed in a previous paper by the authors in abdominal sepsis. The present study addresses an important aspect possibly conditioning the outcome of sepsis that is the release of ATP by bacteria. The authors show that sepsis-associated bacteria do in fact release ATP in a growth dependent and strain-specific manner. However, whether this bacterial derived ATP play a role in the pathogenesis of abdominal sepsis has not been determined. To address this question, a number of mutant strains of E. coli has been used first to correlate bacterial ATP release with growth and then, with outer membrane integrity and bacterial death. By using E. coli transformants expressing the ATP-degrading enzyme apyrase in the periplasmic space, the paper nicely shows that abdominal sepsis by these transformants results in significantly improved survival. This effect was associated with a reduction of peritoneal macrophages and CX3CR1+ monocytes, and an increase in neutrophils. To extrapolate the function of bacterial ATP from the systemic response to microorganisms, the authors exploited bacterial OMVs either loaded or not with ATP to investigate the systemic effects devoid of living microorganisms. This approach showed that ATP-loaded OMVs induced degranulation of neutrophils after lysosomal uptake, suggesting that this mechanism could contribute to sepsis severity.

      Strengths:

      • A strong part of the study is the analysis of E. coli mutants to address different aspects of bacterial release of ATP that could be relevant during systemic dissemination of bacteria in the host.

      We want to thank the reviewer for recognizing this important aspect of our experimental approach.

      Weaknesses:

      • As pointed out in the limitations of the study whether ATP-loaded OMVs provide a mechanistic proof of the pathogenetic role of bacteria-derived ATP independently of live microorganisms in sepsis is interesting but not definitively convincing. It could be useful to see whether degranulation of neutrophils is differentially induced by apyrase-expressing vs control E. coli transformants.

      We thank the reviewer for raising several important points. In our study, we assessed local and systemic effects of released bacterial ATP. The consequences of local bacterial ATP release were assessed using an apyrase-expressing E. coli transformant. Locally, bacterial ATP resulted in a decrease in neutrophil numbers and we hypothesize that directly released bacterial ATP either leads to neutrophil death (e.g. via P2X7 receptor (Proietti et al., 2019)) or interferes with the recruitment of neutrophils (e.g. via P2Y receptors (Junger, 2011)).

      The systemic consequences were assessed using ATP-loaded and empty OMV. We have shown that degranulation is induced by OMV-derived bacterial ATP. ATP-containing OMV are engulfed by neutrophils, reach its endolysosomal compartment and might activate purinergic receptors, which then lead to aberrant degranulation. This concept, that needs to be explored in future studies, is fundamentally different from classical purinergic signaling via directly released bacterial ATP into the extracellular space.

      It is possible that neutrophil degranulation is also modulated by directly released bacterial ATP. We agree that this should be assessed in future studies. Also, the role of OMV-derived bacterial ATP should be assessed locally as well as the importance of directly released vs. OMV-mediated bacterial ATP dissected locally. Based on our measurements (Figure 4-figure supplement 1A and Figure 5C), we estimate that the effect of OMV-derived bacterial ATP might be much smaller than the effects of directly released bacterial ATP. Thus, direct ATP release might predominate locally. However, we fully agree that this has to be investigated in a future study to reconcile the different aspects of bacterial ATP signaling. A paragraph will be added to the manuscript, in which we discuss this particular issue.

      • Also, the increase of neutrophils in bacterial ATP-depleted abdominal sepsis, which has better outcomes than "ATP-proficient" sepsis, seems difficult to correlate to the hypothesized tissue damage induced by ATP delivered via non-infectious OMVs.

      We fully acknowledge the mentioned discrepancy. What we propose is that bacterial ATP exhibits different functions that are dependent on the release mechanism (see above). Locally, in the peritoneal cavity, neutrophil numbers are decreased by directly released bacterial ATP. Remotely, ATP is delivered via OMV and impacts on neutrophil function. We agree that, in particular, in the peritoneal cavity, both effects may play a role. However, the impact of directly released bacterial ATP seems to be dominant (see above).

      We propose that neutrophils are decreased locally because of directly released bacterial ATP, which prevents efficient infection control and, therefore, impairs sepsis survival. In addition, these fewer neutrophils might even be dysregulated by the engulfment of bacterial ATP delivered via OMV, which leads to an upregulated and possibly aberrant degranulation process worsening local and remote tissue damage. We agree that in addition to neutrophil numbers, the function of local neutrophils should be assessed with and without the influence of OMV-delivered bacterial ATP. This could be done by RNA sequencing of primary neutrophils from the peritoneal cavity or neutrophil cell lines as well as degranulation assays.

      • Are the neutrophils counts affected by ATP delivered via OMVs?

      This is difficult to show in the peritoneal cavity where we have both, directly released bacterial ATP and OMV-derived bacterial ATP. We assessed such putative difference, however, for the systemic organs and the blood, where we did not find any differences in neutrophil numbers. We will include the figure in the revised manuscript as Figure 6-figure supplement 3C.

      Author response image 1.

      • A comparison of cytokine profiles in the abdominal fluids of E. coli and OMV treated animals could be helpful in defining the different responses induced by OMV-delivered vs bacterial-released ATP. The analyses performed on OMV treated versus E. coli infected mice are not closely related and difficult to combine when trying to draw a hypothesis for bacterial ATP in sepsis.

      We fully agree that there are several open questions that remain to be elucidated, in particular, to differentiate the local role of directly released versus OMV-delivered bacterial ATP. In this study, we laid the foundation for future in vivo research to examine the specific role of bacterial ATP in sepsis. Such future research avenues might be to investigate the local effects of OMV-delivered bacterial ATP, and how neutrophil migration, apoptosis and degranulation are altered. We agree that exploration of the local secretory immune response and cytokine profiles are relevant to understand the different mechanisms of how bacterial ATP alters sepsis. However, such experiments should be ideally performed in systems where the source and the delivery of ATP can be modulated locally.

      • Also it was not clear why lung neutrophils were used for the RNAseq data generation and analysis.

      Thank you for this remark. We have chosen primary lung neutrophils for four reasons:

      (1) Isolation of primary lung neutrophils allowed us to assess an in vivo response that would not have been possible with cell lines.

      (2) The lung and the respiratory system are among the clinically most important organs affected during sepsis resulting in a significant cause of mortality.

      (3) We show in Figure 6C that specifically in the lung, OMV are engulfed by neutrophils, which shows the relevance of the lung also in our study context.

      (4) And finally, lung neutrophils were chosen to examine specifically distant and not local effects.

      Reviewer #2 (Public Review):

      Summary:

      • In their manuscript "Released Bacterial ATP Shapes Local and Systemic Inflammation during Abdominal Sepsis", Daniel Spari et al. explored the dual role of ATP in exacerbating sepsis, revealing that ATP from both host and bacteria significantly impacts immune responses and disease progression.

      Strengths:

      • The study meticulously examines the complex relationship between ATP release and bacterial growth, membrane integrity, and how bacterial ATP potentially dampens inflammatory responses, thereby impairing survival in sepsis models. Additionally, this compelling paper implies a concept that bacterial OMVs act as vehicles for the systemic distribution of ATP, influencing neutrophil activity and exacerbating sepsis severity.

      We thank the reviewer for mentioning these key points and supporting the relevance of our study.

      Weaknesses:

      (1) The researchers extracted and cultivated abdominal fluid on LB agar plates, then randomly picked 25 colonies for analysis. However, they did not conduct 16S rRNA gene amplicon sequencing on the fluid itself. It is worth noting that the bacterial species present may vary depending on the individual patients. It would be beneficial if the authors could specify whether they've verified the existence of unculturable species capable of secreting high levels of Extracellular ATP.

      Most septic complications are caused by a limited spectrum of bacteria, belonging mainly either to the Firmicutes or the Proteobacteria phyla, including E. coli, K. pneumoniae, S. aureus or E. faecalis (Diekema et al., 2019; Mureșan et al., 2018). We validated this well documented existing evidence by randomly assessing 25 colonies. For the planned experiments, it was crucial to work with culturable bacteria; otherwise, ATP measurements, the modulation of ATP generation or loading of OMV would not have been possible. Using such culturable bacteria allowed us to describe mechanisms of ATP release.

      We fully agree that hard-to-culture or unculturable bacteria might contribute significantly to septic complications. This, however, would need to be explored in future studies using extensive culturing methods (Cheng et al., 2022).

      (2) Do mice lacking commensal bacteria show a lack of extracellular ATP following cecal ligation puncture?

      ATP is typically secreted by many cells of the host in active and passive manners in the case of any injury, including cecal ligation and puncture (Burnstock, 2016; Dosch et al., 2018; Eltzschig et al., 2012; Idzko et al., 2014). We hypothesize that bacterial ATP is a potential priming agent at early stages of sepsis, and indeed, at such early time points, a comparison of peritoneal ATP levels between germfree and colonized mice could support our hypothesis. Future studies addressing this question must, however, correct for the different immune responses between germ-free and colonized mice. This is of utmost importance, especially for the cecal ligation and puncture model, since the cecum of germ-free mice is extremely large, making such experiments hard to control.

      (3) The authors isolated various bacteria from abdominal fluid, encompassing both Gram-negative and Gram-positive types. Nevertheless, their emphasis appeared to be primarily on the Gram-negative E. coli. It would be beneficial to ascertain whether the mechanisms of Extracellular ATP release differ between Gram-positive and Gram-negative bacteria. This is particularly relevant given that the Gram-positive bacterium E. faecalis, also isolated from the abdominal fluid, is recognized for its propensity to release substantial amounts of Extracellular ATP.

      We fully agree with this comment. In this paper, we used E. coli as our model organism to determine the principles of sepsis-associated bacterial ATP release and therefore focused on gram-negative bacteria. In addition to the direct, growth-dependent release, we found a relevant impact of OMV-delivered bacterial ATP. For this latter purpose, a gram-negative strain, in which OMV generation has been well described (Schwechheimer & Kuehn, 2015), was chosen. Recently, gram-positive bacteria have been shown to secrete ATP and OMV as well (Briaud & Carroll, 2020; Hironaka et al., 2013; Iwase et al., 2010). Given the fundamental differences in the structure of the cell wall of gram-positive bacteria and the mechanisms of OMV generation and release, future studies are required to assess the relevance of directly released and OMV-delivered ATP in gram-positive bacteria.

      (4) The authors observed changes in the levels of LPM, SPM, and neutrophils in vivo. However, it remains uncertain whether the proliferation or migration of these cells is modulated or inhibited by ATP receptors like P2Y receptors. This aspect requires further investigation to establish a convincing connection.

      We fully agree with this comment. The decrease in LPM and the consequential predomination of SPM have been well described after inflammatory stimuli in the context of the macrophage disappearance reaction (Ghosn et al., 2010). Also, it has been shown that purinergic signaling modulates infiltration of neutrophils and can lead to cell death as a consequence of P2Y and P2X receptor activation (Junger, 2011; Proietti et al., 2019). In our study, we propose that intracellular purinergic receptors contribute to neutrophil function during sepsis. After introducing the general principles and fundaments of bacterial ATP with our studies, we fully agree that additional experiments need to address downstream purinergic receptor activation. That, however, would go beyond the scope of our study.

      (5) Additionally, is it possible that the observed in vivo changes could be triggered by bacterial components other than Extracellular ATP? In this research field, a comprehensive collection of inhibitors is available, so it is desirable to utilize them to demonstrate clearer results.

      This question is of utmost importance and defined the choice of our model and experimental approach. When we started the project, we used two different E. coli mutants that release low (ompC) and high (eaeH) amounts of ATP. However, the limitation of this approach is that these are different bacteria, which may also differ in the components they secrete or the surface proteins they express. We, therefore, decided against that approach. With the approach we finally used (same bacterium, just with and without ATP), we aimed to minimize the influence of non-ATP bacterial components.

      (6) Have the authors considered the role of host-derived Extracellular ATP in the context of inflammation?

      Yes, the role of host-derived extracellular ATP in inflammation and sepsis is well-established with contradictory results (Csóka et al., 2015; Ledderose et al., 2016). This conflicting data was the rationale to test the relevance of bacterial ATP. We suggest that bacterial ATP is essential in the early phase of sepsis when bacteria invade the sterile compartment and before efficient host response, including the eukaryotic release of ATP, is established.

      (7) The authors mention that Extracellular ATP is rapidly hydrolyzed by ectonucleotases in vivo. Are the changes of immune cells within the peritoneal cavity caused by Extracellular ATP released from bacterial death or by OMVs?

      This is a relevant question that was also asked by reviewer #1, and we answered it in detail above (weaknesses comment #1 and #2). From our ATP measurements (Figure 4-figure supplement 1A and Figure 5C), we conclude that locally, the role of directly released bacterial ATP (extracellular) predominates over OMV-derived bacterial ATP. Furthermore, the mechanisms between directly released and OMV-derived bacterial ATP (within OMV, engulfed and transported to the endolysosomal compartment) are different, and especially extracellular ATP has been described to lead to apoptosis via P2X7 signaling.

      (8) In the manuscript, the sample size (n) for the data consistently remains at 2. I would suggest expanding the sample size to enhance the robustness and rigor of the results.

      Two biological replicates (independent cultures) were only used for the bacteria cultures in Figure 1, Figure 2, and Figure 3, which achieved similar results and the standard deviation remained very small, indicating its robustness. In the in vitro experiments in Figure 5 we used a sample size of 6 (three biological replicates measured in technical duplicates), since we saw bigger deviations in our measurements. For the in vivo experiments, we always used 5 or more animals in at least two independent experiments.

      References

      Briaud, P., & Carroll, R. K. (2020). Extracellular Vesicle Biogenesis and Functions in Gram-Positive Bacteria. Infection and Immunity, 88(12), 10.1128/iai.00433-20. https://doi.org/10.1128/iai.00433-20

      Burnstock, G. (2016). P2X ion channel receptors and inflammation. Purinergic Signalling, 12(1), 59–67. https://doi.org/10.1007/s11302-015-9493-0

      Cheng, A. G., Ho, P.-Y., Aranda-Díaz, A., Jain, S., Yu, F. B., Meng, X., Wang, M., Iakiviak, M., Nagashima, K., Zhao, A., Murugkar, P., Patil, A., Atabakhsh, K., Weakley, A., Yan, J., Brumbaugh, A. R., Higginbottom, S., Dimas, A., Shiver, A. L., … Fischbach, M. A. (2022). Design, construction, and in vivo augmentation of a complex gut microbiome. Cell, 185(19), 3617-3636.e19. https://doi.org/10.1016/j.cell.2022.08.003

      Csóka, B., Németh, Z. H., Törő, G., Idzko, M., Zech, A., Koscsó, B., Spolarics, Z., Antonioli, L., Cseri, K., Erdélyi, K., Pacher, P., & Haskó, G. (2015). Extracellular ATP protects against sepsis through macrophage P2X7 purinergic receptors by enhancing intracellular bacterial killing. The FASEB Journal, 29(9), 3626–3637. https://doi.org/10.1096/fj.15-272450

      Diekema, D. J., Hsueh, P.-R., Mendes, R. E., Pfaller, M. A., Rolston, K. V., Sader, H. S., & Jones, R. N. (2019). The Microbiology of Bloodstream Infection: 20-Year Trends from the SENTRY Antimicrobial Surveillance Program. Antimicrobial Agents and Chemotherapy, 63(7), e00355-19. https://doi.org/10.1128/AAC.00355-19

      Dosch, M., Gerber, J., Jebbawi, F., & Beldi, G. (2018). Mechanisms of ATP Release by Inflammatory Cells. International Journal of Molecular Sciences, 19(4), 1222. https://doi.org/10.3390/ijms19041222

      Eltzschig, H. K., Sitkovsky, M. V., & Robson, S. C. (2012). Purinergic Signaling during Inflammation. New England Journal of Medicine, 367(24), 2322–2333. https://doi.org/10.1056/NEJMra1205750

      Ghosn, E. E. B., Cassado, A. A., Govoni, G. R., Fukuhara, T., Yang, Y., Monack, D. M., Bortoluci, K. R., Almeida, S. R., Herzenberg, L. A., & Herzenberg, L. A. (2010). Two physically, functionally, and developmentally distinct peritoneal macrophage subsets. Proceedings of the National Academy of Sciences, 107(6), 2568–2573. https://doi.org/10.1073/pnas.0915000107

      Hironaka, I., Iwase, T., Sugimoto, S., Okuda, K., Tajima, A., Yanaga, K., & Mizunoe, Y. (2013). Glucose Triggers ATP Secretion from Bacteria in a Growth-Phase-Dependent Manner. Applied and Environmental Microbiology, 79(7), 2328–2335. https://doi.org/10.1128/AEM.03871-12

      Idzko, M., Ferrari, D., & Eltzschig, H. K. (2014). Nucleotide signalling during inflammation. Nature, 509(7500), 310–317. https://doi.org/10.1038/nature13085

      Iwase, T., Shinji, H., Tajima, A., Sato, F., Tamura, T., Iwamoto, T., Yoneda, M., & Mizunoe, Y. (2010). Isolation and Identification of ATP-Secreting Bacteria from Mice and Humans. Journal of Clinical Microbiology, 48(5), 1949–1951. https://doi.org/10.1128/JCM.01941-09

      Junger, W. G. (2011). Immune cell regulation by autocrine purinergic signalling. Nature Reviews Immunology, 11(3), 201–212. https://doi.org/10.1038/nri2938

      Ledderose, C., Bao, Y., Kondo, Y., Fakhari, M., Slubowski, C., Zhang, J., & Junger, W. G. (2016). Purinergic Signaling and the Immune Response in Sepsis: A Review. Clinical Therapeutics, 38(5), 1054–1065. https://doi.org/10.1016/j.clinthera.2016.04.002

      Mureșan, M. G., Balmoș, I. A., Badea, I., & Santini, A. (2018). Abdominal Sepsis: An Update. The Journal of Critical Care Medicine, 4(4), 120–125. https://doi.org/10.2478/jccm-2018-0023

      Proietti, M., Perruzza, L., Scribano, D., Pellegrini, G., D’Antuono, R., Strati, F., Raffaelli, M., Gonzalez, S. F., Thelen, M., Hardt, W.-D., Slack, E., Nicoletti, M., & Grassi, F. (2019). ATP released by intestinal bacteria limits the generation of protective IgA against enteropathogens. Nature Communications, 10(1), Article 1. https://doi.org/10.1038/s41467-018-08156-z

      Schwechheimer, C., & Kuehn, M. J. (2015). Outer-membrane vesicles from Gram-negative bacteria: Biogenesis and functions. Nature Reviews Microbiology, 13(10), 605–619. https://doi.org/10.1038/nrmicro3525

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work made a lot of efforts to explore the multifaceted roles of the inferior colliculus (IC) in auditory processing, extending beyond traditional sensory encoding. The authors recorded neuronal activitity from the IC at single unit level when monkeys were passively exposed or actively engaged in behavioral task. They concluded that 1)IC neurons showed sustained firing patterns related to sound duration, indicating their roles in temporal perception, 2) IC neuronal firing rates increased as sound sequences progress, reflecting modulation by behavioral context rather than reward anticipation, 3) IC neurons encode reward prediction error and their capability of adjusting responses based on reward predictability, 4) IC neural activity correlates with decision-making. In summary, this study tried to provide a new perspective on IC functions by exploring its roles in sensory prediction and reward processing, which are not traditionally associated with this structure.

      Strengths:

      The major strength of this work is that the authors performed electrophysiological recordings from the IC of behaving monkeys. Compared with the auditory cortex and thalamus, the IC in monkeys has not been adequately explored.

      We appreciate the reviewer’s acknowledgment of the efforts and strengths of our study. Indeed, our goal was to provide a comprehensive exploration of the multifaceted roles of the inferior colliculus (IC) in auditory processing and beyond, particularly in sensory prediction and reward processing. The use of electrophysiological recordings in behaving monkeys was central to our approach, as we sought to uncover the underexplored aspects of IC function in these complex cognitive domains. We are pleased that the reviewer recognizes the value of investigating the IC, a structure that has not been adequately explored in primates compared to other auditory regions like the cortex and thalamus. This feedback reinforces our belief that our work contributes significantly to advancing the understanding of the IC's roles in cognitive processing.

      We look forward to addressing any further points the reviewers may have and refining our manuscript accordingly. Thank you for your constructive feedback and for recognizing the strengths of our research approach.

      Weaknesses:

      (1) The authors cited several papers focusing on dopaminergic inputs in the IC to suggest the involvement of this brain region in cognitive functions. However, all those cited work were done in rodents. Whether monkey's IC shares similar inputs is not clear.

      We appreciate the reviewer's insightful comment on the limitations of extrapolating findings from rodent models to monkeys, particularly concerning dopaminergic inputs to the Inferior Colliculus (IC). While it is true that most studies on dopaminergic inputs to the IC have been conducted in rodents, to our knowledge, no studies have been conducted specifically in primates. To address the reviewer's concern, we have added a statement in both the introduction and discussion sections of our manuscript:

      - Introduction: " However, these studies were conducted in rodents, and the existence and role of dopaminergic inputs in the primate IC remain underexplored."

      - Discussion: " However, the exact mechanisms and functions of dopamine modulation in the inferior colliculus are still not fully understood, particularly in primates. "

      (2) The authors confused the two terms, novelty and deviation. According to their behavioral paradigm, deviation rather than novelty should be used in the paper because all the stimuli have been presented to the monkeys during training. Therefore, there is actually no novel stimuli but only deviant stimuli. This reflects that the author has misunderstood the basic concept.

      We appreciate the reviewer's clarification regarding the distinction between "novelty" and "deviation" in the context of our behavioral paradigm. We agree that, given the nature of our experimental design where all stimuli were familiar to the monkeys during training, the term "deviation" more accurately describes the stimuli used in our study rather than "novelty."

      To address this, we have revised the manuscript to replace the term "novelty" with "deviation" wherever applicable. This change has been made to ensure accurate terminology is used throughout the paper, thereby eliminating any potential misunderstanding of the concepts involved in our study.

      We thank the reviewer for pointing out this important distinction, which has improved the clarity and precision of our manuscript.

      (3) Most of the conclusions were made based on correlational analysis or speculation without providing causal evidences.

      We appreciate the reviewer’s concern regarding the reliance on correlational analyses in our study. Indeed, we acknowledge that the conclusions drawn primarily reflect correlations between neuronal activity and behavioral outcomes, rather than direct causal evidence. This limitation is inherent to many electrophysiological studies, particularly those conducted in behaving primates, where direct manipulation of specific neural circuits to establish causality is often challenging.

      This limitation becomes even more complex when considering the IC’s role as a key lower-level relay station in the auditory pathway. Manipulating IC activity could potentially affect auditory responses in downstream pathways, which, in turn, may influence sensory prediction and decision-making processes. Moreover, we hypothesize that the sensory prediction and reward signals observed in the IC may not have direct causal effects but may instead be driven by top-down projections from higher cognitive regions. However, it is important to emphasize that our study provides novel evidence that the IC may exhibit multiple facets of cognitive signaling, which could inspire future research into the underlying mechanisms and broader functional implications of these signals.

      To address this, we have taken the following steps in our revised manuscript:

      (1) Clarified the Scope of Conclusions: We have revised the language in the Results and Discussion sections to explicitly state that our findings represent correlational relationships rather than causal mechanisms. For example, we now refer to the associations observed between IC activity and behavioral outcomes as "correlational" and have refrained from making definitive causal claims without supporting experimental evidence.

      (2) Proposed Future Directions: In the Discussion section, we have included suggestions for future studies to directly test the causality of the observed relationships. We acknowledge the need for further investigation to substantiate the causal links between IC activity and cognitive functions such as sensory prediction, decision-making, and reward processing.

      We believe these revisions provide a more balanced interpretation of our findings while emphasizing the importance of future research to build on our results and establish causal relationships. Thank you for raising this critical point, which has led to a more rigorous and transparent presentation of our study.

      (4) Results are presented in a very "straightforward" manner with too many detailed descriptions of phenomena but lack of summary and information synthesis. For example, the first section of Results is very long but did not convey clear information.

      We appreciate the reviewer’s feedback regarding the presentation of our results. We understand that the detailed descriptions of phenomena may have made it difficult to discern the key findings and overarching themes in the study. We recognize the importance of balancing detailed reporting with clear summaries and synthesis to effectively communicate our findings.

      To address this concern, we have made the following revisions to the manuscript:

      (1) Condensed and Synthesized Key Findings: We have streamlined the presentation of the Results section by condensing overly detailed descriptions and focusing on the most critical aspects of the data. Key findings are now summarized at the end of each subsection to ensure that the main points are clearly conveyed.

      (2) Enhanced Section Summaries: We have added summary statements at the end of each major results section to synthesize the findings and highlight their significance. This should help guide the reader through the narrative and emphasize the key takeaways from each part of the study.

      (3) Improved Flow and Clarity: We have revised the structure and organization of the Results section to improve the flow of information. By rearranging certain paragraphs and refining the language, we aim to present the results in a more cohesive and coherent manner.

      We believe these changes will make the Results section more accessible and informative, allowing readers to more easily grasp the significance of our findings. Thank you for your valuable suggestion, which has significantly improved the clarity and impact of our manuscript.

      (5) The logic between different sections of Results is not clear.

      We appreciate the reviewer’s observation regarding the lack of clear logical connections between different sections of the Results. We acknowledge that a coherent flow is essential for effectively communicating the progression of findings and their implications.

      To address this concern, we have made the following revisions:

      (1) Enhanced Transitions Between Sections: We have introduced clearer transitional statements between sections of the Results. These transitions explicitly state how each new section builds upon or relates to the previous findings, creating a more cohesive narrative.

      (2) Integration of Findings: In several places within the Results, we have added brief synthesis paragraphs that integrate findings across sections. These integrative summaries help to tie together the different aspects of our study, demonstrating how they collectively contribute to our understanding of the Inferior Colliculus’s (IC) role in sensory prediction, decision-making, and reward processing.

      (3) Clarified Rationale: At the beginning of each major section, we have clarified the rationale behind why certain experiments were conducted, connecting them more clearly to the overarching goals of the study. This should help the reader understand the purpose of each set of results in the context of the broader research objectives.

      We believe these changes improve the overall coherence and readability of the Results section, allowing readers to better follow the logical progression of our study. We are grateful for this constructive feedback and believe it has significantly enhanced the manuscript.

      (6) In the Discussion, there is excessive repetition of results, and further comparison with and discussion of potentially related work are very insufficient. For example, Metzger, R.R., et al. (J Neurosc, 2006) have shown similar firing patterns of IC neurons and correlated their findings with reward.

      We appreciate the reviewer's insightful critique regarding the excessive repetition in the Discussion and the lack of sufficient comparison with related work. We acknowledge that a well-balanced Discussion should not only interpret findings but also place them in the context of existing literature to highlight the novelty and significance of the study.

      To address these concerns, we have made the following revisions:

      (1) Reduction of Repetition: We have carefully revised the Discussion to minimize redundant repetition of the Results. Instead of restating the findings, we now focus more on their implications, limitations, and how they advance the current understanding of the Inferior Colliculus (IC) and its broader cognitive roles.

      (2) Incorporation of Related Work: We have expanded the Discussion to include a more comprehensive comparison with existing literature, specifically highlighting studies that have reported similar findings. For example, we now discuss the work by Metzger et al. (2006), which demonstrated similar firing patterns of IC neurons and correlated these with reward-related processes. This comparison helps contextualize our results and emphasizes the novel contributions our study makes to the field.

      We believe these revisions have significantly improved the quality of the Discussion by reducing unnecessary repetition and providing a more thorough engagement with the relevant literature. We are grateful for the reviewer's valuable feedback, which has helped us refine and strengthen the manuscript.

      Reviewer #2 (Public review):

      Summary:

      The inferior colliculus (IC) has been explored for its possible functions in behavioral tasks and has been suggested to play more important roles rather than simple sensory transmission. The authors revealed the climbing effect of neurons in IC during decision-making tasks, and tried to explore the reward effect in this condition.

      Strengths:

      Complex cognitive behaviors can be regarded as simple ideals of generating output based on information input, which depends on all kinds of input from sensory systems. The auditory system has hierarchic structures no less complex than those areas in charge of complex functions. Meanwhile, IC receives projections from higher areas, such as auditory cortex, which implies IC is involved in complex behaviors. Experiments in behavioral monkeys are always time-consuming works with hardship, and this will offer more approximate knowledge of how the human brain works.

      We greatly appreciate the reviewer's positive summary of our work and recognition of the effort involved in conducting experiments on behaving monkeys. We agree with the reviewer that the inferior colliculus (IC) plays a significant role beyond mere sensory transmission, particularly in integrating sensory inputs with higher cognitive functions. Our study aims to shed light on these complex functions by revealing the climbing effect of IC neurons during decision-making tasks and exploring how reward influences this dynamic.

      We are encouraged that the reviewer acknowledges the importance of investigating the IC's role within the broader framework of complex cognitive behaviors and appreciates the hierarchical nature of the auditory system. The reviewer's comments reinforce the value of our research in contributing to a more nuanced understanding of how the IC might contribute to sensory-cognitive integration.

      We thank the reviewer for highlighting the significance of using behavioral monkey models to approximate human brain function. We are hopeful that our findings will serve as a stepping stone for further research exploring the multifaceted roles of the IC in cognition and behavior.

      We will now proceed to address the specific concerns and suggestions provided by the reviewer in the following sections.

      Weaknesses:

      These findings are more about correlation but not causality of IC function in behaviors. And I have a few major concerns.

      We appreciate the reviewer’s concern regarding the reliance on correlational analyses in our study. We acknowledge the importance of distinguishing between correlation and causality. As detailed in our response to Question 3 from Reviewer #1, we recognize the limitations of relying on correlational data and the challenges of establishing direct causal links in electrophysiological studies involving behaving primates.

      We have taken steps to clarify this distinction throughout our manuscript. Specifically, we have revised the Results and Discussion sections to ensure that the findings are presented as correlational, not causal, and we have proposed future studies utilizing more direct manipulation techniques to assess causality. We hope these revisions adequately address your concerns.

      Comparing neurons' spike activities in different tests, a 'climbing effect' was found in the oddball paradigm. The effect is clearly related to training and learning process, but it still requires more exploration to rule out a few explanations. First, repeated white noise bursts with fixed inter-stimulus-interval of 0.6 seconds was presented, so that monkeys might remember the sounds by rhymes, which is some sort of learned auditory response. It is interesting to know monkeys' responses and neurons' activities if the inter-stimuli-interval is variable. Second, the task only asked monkeys to press one button and the reward ratio (the ratio of correct response trials) was around 78% (based on the number from Line 302). so that, in the sessions with reward, monkeys had highly expected reward chances, does this expectation cause the climbing effect?

      We thank the reviewer for raising these insightful points regarding the 'climbing effect' observed in the oddball paradigm and its potential relationship with training, learning processes, and reward expectation. Below, we address each of the reviewer's specific concerns:

      (1) Inter-Stimulus Interval (ISI) and Rhythmic Auditory Response:

      The reviewer suggests that the fixed inter-stimulus interval (ISI) of 0.6 seconds might lead to a rhythmic auditory response, where monkeys could anticipate the sounds. We appreciate this perspective. However, we believe that rhythm is unlikely to play a significant role in the 'climbing effect' for the following reason: The 'climbing effect' starts from the second sound in the block (Fig.2D and Fig.3B), before any rhythm or pattern could be fully established, as a rhythm generally requires at least three repetitions to form. Unfortunately, we did not explore variable ISIs in the current study, so we cannot directly address this concern with the data at hand.

      (2) Reward Expectation and Climbing Effect:

      The reviewer raises an important concern about whether the 'climbing effect' could be influenced by the monkeys' high reward expectation, especially given the high reward ratio (~78%) in the sessions. While it is plausible that reward expectation could contribute to the observed increase in neuronal firing rates, we believe the results from our reward experiment (Fig. 4) suggest otherwise. In this experiment, even though reward expectation was likely formed due to the consistent pairing of sounds with rewards (100%), we did not observe a climbing effect in the auditory response. The presence of reward prediction error (Fig. 4D) further suggests that while the monkeys may form reward expectations, these expectations do not directly drive the climbing effect.

      To clarify this point, we have added sentences in the revised manuscript to explicitly discuss the relationship between reward expectation and the climbing effect, emphasizing that our findings indicate the climbing effect is not primarily due to reward expectation.

      We believe these revisions provide a clearer understanding of the factors contributing to the climbing effect and address the reviewer's concerns effectively. Thank you for these valuable suggestions.

      "Reward effect" on IC neurons' responses were showed in Fig. 4. Is this auditory response caused by physical reward action or not? In reward sessions, IC neurons have obvious response related to the onset of water reward. The electromagnetic valve is often used in water-rewarding system and will give out a loud click sound every time when the reward is triggered. IC neurons' responses may be simply caused by the click sound if the electromagnetic valve is used. It is important to find a way to rule out this simple possibility.

      We appreciate the reviewer’s concern regarding the potential confounding factor introduced by the electromagnetic valve’s click sound during water reward delivery, which could be misinterpreted as an auditory response rather than a response to the reward itself. Anticipating this possibility, we took measures to eliminate it by placing the electromagnetic valve outside the soundproof room where the neuronal recordings were performed.

      To address your concern more explicitly, we have added sentences in the Methods section of the revised manuscript detailing this setup, ensuring that readers are aware of the steps we took to eliminate this potential confound. By doing so, we believe that the observed reward-related neural activity in the IC is attributable to the reward processing itself rather than an auditory response to the valve click. We appreciate you bringing this important aspect to our attention, and we hope our clarification strengthens the interpretation of our findings.

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to investigate the multifaceted roles of the Inferior Colliculus (IC) in auditory and cognitive processes in monkeys. Through extracellular recordings during a sound duration-based novelty detection task, the authors observed a "climbing effect" in neuronal firing rates, suggesting an enhanced response during sensory prediction. Observations of reward prediction errors within the IC further highlight its complex integration in both auditory and reward processing. Additionally, the study indicated IC neuronal activities could be involved in decision-making processes.

      Strengths:

      This study has the potential to significantly impact the field by challenging the traditional view of the IC as merely an auditory relay station and proposing a more integrative role in cognitive processing. The results provide valuable insights into the complex roles of the IC, particularly in sensory and cognitive integration, and could inspire further research into the cognitive functions of the IC.

      We appreciate the reviewer’s positive summary of our work and recognition of its potential impact on the field. We are pleased that the reviewer acknowledges the significance of our findings in challenging the traditional view of the Inferior Colliculus (IC) as merely an auditory relay station and in proposing its integrative role in cognitive processing.

      Our study indeed aims to provide new insights into the multifaceted roles of the IC, particularly in the context of sensory and cognitive integration. We believe that this research could pave the way for future studies that further explore the cognitive functions of the IC and its involvement in complex behavioral processes.

      We are encouraged by the reviewer’s positive assessment and are committed to continuing to refine our work in response to the constructive feedback provided. We hope that our findings will contribute to advancing the understanding of the IC’s role in the broader context of neuroscience.

      We will now proceed to address the specific concerns and suggestions provided by the reviewer in the following sections.

      Weaknesses:

      Major Comments:

      (1) Structural Clarity and Logic Flow:

      The manuscript investigates three intriguing functions of IC neurons: sensory prediction, reward prediction, and cognitive decision-making, each of which is a compelling topic. However, the logical flow of the manuscript is not clearly presented and needs to be well recognized. For instance, Figure 3 should be merged into Figure 2 to present population responses to the order of sounds, thereby focusing on sensory prediction. Given the current arrangement of results and figures, the title could be more aptly phrased as "Beyond Auditory Relay: Dissecting the Inferior Colliculus's Role in Sensory Prediction, Reward Prediction, and Cognitive Decision-Making."

      We appreciate the reviewer’s detailed feedback on the structural clarity and logical flow of the manuscript. We understand the importance of presenting our findings in a clear and cohesive manner, especially when addressing multiple complex topics such as sensory prediction, reward prediction, and cognitive decision-making.

      To address the reviewer's concerns, we have made the following revisions:

      (1) Reorganization of Figures and Results:

      We agree with the suggestion to merge Figure 3 into Figure 2. By doing so, we can present the population responses to the order of sounds more effectively, thereby streamlining the focus on sensory prediction. This will allow readers to more easily follow the progression of the results related to this key function of the IC.

      We have reorganized the Results section to ensure a smoother transition between the different aspects of IC function that we are investigating. The new structure will better guide the reader through the narrative, aligning with the themes of sensory prediction, reward prediction, and cognitive decision-making.

      (2) Revised Title:

      In line with the reviewer's suggestion, we have revised the title to "Beyond Auditory Relay: Dissecting the Inferior Colliculus's Role in Sensory Prediction, Reward Prediction, and Cognitive Decision-Making." We believe this title more accurately reflects the scope and focus of our study, as it highlights the three core functions of the IC that we are investigating.

      (3) Improved Logic Flow:

      We have added introductory statements at the beginning of each section within the Results to clarify the rationale behind the experiments and the logical connections between them. This should help to improve the overall flow of the manuscript and make the progression of our findings more intuitive for readers.

      We believe these changes significantly enhance the clarity and logical structure of the manuscript, making it easier for readers to understand the sequence and importance of our findings. Thank you for your valuable suggestion, which has led to a more coherent and focused presentation of our work.

      (2) Clarification of Data Analysis:

      Key information regarding data analysis is dispersed throughout the results section, which can lead to confusion. Providing a more detailed and cohesive explanation of the experimental design would significantly enhance the interpretation of the findings. For instance, including a detailed timeline and reward information for the behavioral paradigms shown in Figures 1C and D would offer crucial context for the study. More importantly, clearly presenting the analysis temporal windows and providing comprehensive statistical analysis details would greatly improve reader comprehension.

      We appreciate the reviewer’s insightful comment regarding the need for clearer and more cohesive explanations of the data analysis and experimental design. We recognize that a well-structured presentation of this information is essential for the reader to fully understand and interpret our findings. To address this, we have made the following revisions:

      (1) Detailed Explanation of Experimental Design:

      We have included a more detailed explanation of the experimental design, particularly for the behavioral paradigms shown in Figures 1C and 1D. This includes a comprehensive timeline of the experiments, along with explicit information about the reward structure and timing. By providing this context upfront, we aim to give readers a clearer understanding of the conditions under which the neuronal recordings were obtained.

      (2) Cohesive Presentation of Data Analysis:

      Key information regarding data analysis, which was previously dispersed throughout the Results section, has been consolidated and moved to a dedicated subsection within the Methods. This subsection now provides a step-by-step description of the analysis process, including the temporal windows used for examining neuronal activity, as well as the specific statistical methods employed.

      We have also ensured that the temporal windows used for different analyses (e.g., onset window, late window, etc.) are clearly defined and consistently referenced throughout the manuscript. This will help readers track the use of these windows across different figures and analyses.

      (3) Enhanced Statistical Analysis Details:

      We have expanded the description of the statistical analyses performed in the study, including the rationale behind the choice of tests, the criteria for significance, and any corrections for multiple comparisons. These details are now presented in a clear and accessible format within the Methods section, with relevant information also highlighted in the Result section or the figure legends to facilitate understanding.

      We believe these changes will significantly improve the clarity and comprehensibility of the manuscript, allowing readers to better follow the experimental design, data analysis, and the conclusions drawn from our findings. Thank you for this valuable feedback, which has helped us to enhance the rigor and transparency of our presentation.

      (3) Reward Prediction Analysis:

      The conclusion regarding the IC's role in reward prediction is underdeveloped. While the manuscript presents evidence that IC neurons can encode reward prediction, this is only demonstrated with two example neurons in Figure 6. A more comprehensive analysis of the relationship between IC neuronal activity and reward prediction is necessary. Providing population-level data would significantly strengthen the findings concerning the IC's complex functionalities. Additionally, the discussion of reward prediction in lines 437-445, which describes IC neuron responses in control experiments, does not sufficiently demonstrate that IC neurons can encode reward expectations. It would be valuable to include the responses of IC neurons during trials with incorrect key presses or no key presses to better illustrate this point.

      We deeply appreciate the detailed feedback provided regarding the conclusions on the inferior colliculus (IC)'s role in reward prediction within our manuscript. We acknowledge the importance of a robust and comprehensive presentation of our findings, particularly when discussing complex neural functionalities.

      In response to the reviewers' concerns, we have made the following revisions to strengthen our manuscript:

      (1) Inclusion of Population-Level Data for IC Neurons:

      In the revised manuscript, we have included population-level results for IC neurons in a supplementary figure. Initially, we focused on two example neurons that did not exhibit motor-related responses to key presses to isolate reward-related signals. However, most IC neurons exhibit motor responses during key presses (as indicated in Fig.7), which can complicate distinguishing between reward-related activity and motor responses. This complexity is why we initially presented neurons without motor responses. To clarify this point, we have added sentences in the Results section to explain the rationale behind our selection of neurons and to address the potential overlap between motor and reward responses in the IC.

      (2) Addition of Data on Key Press Errors and No-Response Trials:

      In response to the reviewer’s suggestion, we have demonstrated Peri-Stimulus Time Histograms (PSTHs) for two example neurons during error trials as below, including incorrect key presses and no-response trials. Given that the monkeys performed the task with high accuracy, the number of error trials is relatively small, especially for the control condition (as shown in the top row of the figure). While we remain cautious in drawing definitive conclusions from this limited trials, we observed that no clear reward signals were detected during the corresponding window (typically centered around 150 ms after the end of the sound). It is important to note that the experiment was initially designed to explore decision-making signals in the IC, rather than focusing specifically on reward processing. However, the data in Fig. 6 demonstrated intriguing signals of reward prediction error, which is why we believe it is important to present them.

      When combined with the results from our reward experiment (Fig. 5), we believe these findings provide compelling evidence of reward prediction errors being processed by IC neurons. Additionally, we observed that the reward prediction error in the IC appears to be signed, meaning that IC neurons showed robust responses to unexpected rewards but not to unexpected no-reward scenarios. However, the sign of the reward prediction error should be explored in greater depth with specifically designed experiments in future studies.

      Author response image 1.

      (A) PSTH of the neuron from Figure 6a during a key press trial under control condition. The number in the parentheses in the legend represents the number of trials for control condition. (B) PSTHs of the neuron from Figure 6a during non-key press trials under experimental conditions. The numbers in the parentheses in the legend represent the number of trials for experimental conditions. (C-D) Equivalent PSTHs as in A-B but from the neuron in Figure 6b.

      We are grateful for the reviewer's insightful suggestions, which have allowed us to improve the depth and rigor of our analysis. We believe these revisions significantly enhance our manuscript's conclusions regarding the complex functionalities of IC.

    1. Reviewer #2 (Public Review):

      Patterns scored into or painted on durable media have long been considered important markers of the cognitive capabilities of hominins. More specifically, the association of such markers with Homo sapiens has been used to argue that our evolutionary success was in part shaped by our unique ability to code, store and convey information through abstract conventions.

      That singularity of association has been cast into doubt in the last decade with finds of designs apparently painted or carved by Neanderthals, and potentially by even earlier hominins. Even allowing for these developments, however, extending the capability to generate putatively abstract designs to a relatively small-brained hominin like Homo naledi is contentious. The evidential bar for such claims is necessarily high, and I don't believe that it has been cleared here.

      The central issue is that the engravings themselves are not dated. As the authors themselves note, the minimum age constraint provided by U/Th on flowstone does not necessarily relate to the last occupation of the Dinaledi cave system, as the earlier ESR age on teeth does not necessarily document first use of the cave. The authors state that "At present we have no evidence limiting the time period across which H. naledi was active in the cave system". On those grounds though, assigning the age range of presently dated material within the cave system to the engravings - as the current title unambiguously does - is not justifiable.

      Because we don't know when they were made, the association between the engravings and Homo naledi rests on the assertion that no humans entered and made alterations to the cave system between its last occupation by Homo naledi, and its recent scientific recording. This is argued on page 6 with the statement that "No physical or cultural evidence of any other hominin population occurs within this part of the cave system".

      There is an important contrast between the quotes I have referred to in the last two paragraphs. In the earlier quote, the absence of evidence for Homo naledi in the cave system >335 ka and <241 ka is not considered evidence for their absence before or after these ages. Just because we have no evidence that Homo naledi was in the cave at 200 ka doesn't mean they weren't there, which is an argument I think most archaeologists would accept. When it comes to other kinds of humans, though - per the latter quote - the opposite approach is taken. Specifically, the present lack of physical evidence of more recent humans in the cave is considered evidence that no such humans visited the cave until its exploration by cavers 40 years ago. I don't think many archaeologists would consider that argument compelling. I can see why the authors would be drawn to make that assertion, but an absence of evidence cannot be used to argue in one way for use of the cave by Homo naledi and in another way for use of the cave by all other humans.

      A second problem is with what Homo naledi might have made engravings. The authors state that "The lines appear to have been made by repeatedly and carefully passing a pointed or sharp lithic fragment or tool into the grooves". The authors then describe one rock with superficial similarities to a flake from the more recent site of Blombos to suggest that sharp-edge stones with which to make the engravings were available to Homo naledi. Blombos is considered relevant here presumably because it has evidence for Middle Stone Age engravings. The authors do not, however, demonstrate any usewear on that stone object such as might be expected if it was used to carve dolomite. Given that it is presented as the only such find in the cave system so far, this seems important.

      My greater concern is that the authors did not compare the profile morphology of the Dinaledi engravings with the extensive literature on the morphology of scored lines caused by sharp-edge stone implements (e.g., Braun et al. 2016, Pante et al. 2017). I appreciate that the research group is reticent to undertake any invasive work until necessary, but non-destructive techniques could have been used to produce profiles with which to test the proposition that the engravings were made with a sharp edge stone.

      One thing I noticed in this respect is that the engravings seem very wide, both in absolute terms and relative to their depth. The data I collected from the Middle Stone Age engraved ochre from Klein Kliphuis suggested average line widths typically around 0.1-0.2 mm (Mackay and Welz 2008). The engraved lines at Dinaledi appear to be much wider, perhaps 2-5 mm. This doesn't discount the possibility that the engravings in the Dinaledi system were carved with a sharp edge stone - the range of outcomes for such engravings in soft rock can be quite variable (Hodgskiss 2010) - only that detailed analysis should precede rather than follow any assertion about their mode of formation.

      None of this is to say that the arguments mounted here are wrong. It should be considered possible that Homo naledi made the engravings in the Dinaledi cave system. The problem is that other explanations are not precluded.

      As an example, the western end of the Dinaledi subsystem has a particular geometry to the intersection of its passages, with three dominant orientations, one vertical (which is to say, north-south), and two diagonal (Figure 1). The major lines on Panel A have one repeated vertical orientation and two repeated diagonal orientations (Figure 16), particularly in the upper area not impacted by stromatolites. The lines in both the cave system and engravings in Panel A appear to intersect at similar angles. Several of the cave features appear, superficially at least, to be replicated. In fact, scaled, rotated, and super-imposed, Figure 16 is a plausible 'mud map' of the western end of the Dinaledi system carved incrementally by people exploring the caves. A figure showing this is included here:

      Of course, there are problems with this suggestion. The choice of the upper part of Panel A is selective, the similarity is superficial, and the scales are not necessarily comparable. (Note, btw, that all of those caveats hold equally well for the comparison the authors make between the unmodified rock from Dinaledi and the flake from Blombos in Figure 19). However, the point is that such a 'mud map hypothesis' is, as with the arguments mounted in this paper, both plausible and hard to prove.

      Having read this paper a few times, I am intrigued by the engravings in the Dinaledi system and look forward to learning more about them as this research unfolds. Based on the evidence presently available, however, I feel that we have no robust grounds for asserting when these engravings were made, by whom they were made, or for what reason they were made.

      References:

      • Braun, D. R., et al. (2016). "Cut marks on bone surfaces: influences on variation in the form of traces of ancient behaviour." Interface Focus 6: 20160006.

      • Hodgskiss, T. (2010). "Identifying grinding, scoring and rubbing use-wear on experimental ochre pieces." Journal of Archaeological Science 37: 3344-3358.

      • Mackay, A. & A. Welz (2008). "Engraved ochre from a Middle Stone Age context at Klein Kliphuis in the Western Cape of South Africa." Journal of Archaeological Science 35: 1521-1532.

      • Pante, M. C., et al. (2017). "A new high-resolution 3-D quantitative method for identifying bone surface modifications with implications for the Early Stone Age archaeological record." J Hum Evol 102: 1-11.

    1. Author Response:

      Points from reviewer 1 (Public Review):

      In this manuscript, Yong and colleagues link perturbations in lysosomal lipid metabolism with the generation of protein aggregates resulting from proteosome inhibition.

      We apologize for any confusion in the explanation of the results. We found that both proteasome inhibition and, independently, perturbations to lysosomal lipid metabolism lead to accumulation of protein aggregates in the lysosome. There was no evidence of proteasome inhibition in the context of lysosomal lipid perturbations (Figure 4J).

      Despite using various tools of lysosomal function, acidity, permeability, etc, the authors couldn't identify the link between lysosomal lipid metabolism and protein aggregate formation.

      Indeed, despite testing numerous mechanistic hypotheses, we have yet to explain how perturbation of lysosomal lipid metabolism causes protein aggregates. However, we have demonstrated that lipids are both necessary (via epistasis and serum delipidation) and sufficient (media supplementation) to drive these phenotypes.

      Although this work is interesting and thought-provoking, their approach to identify novel pathways involved in proteostasis is limited and this weakens the contribution of the paper in its current form.

      We are glad the reviewer found the work to be thought-provoking. As a fundamental cellular process critical for longevity, we agree that the connections made here between lipids, lysosomes and protein aggregates are interesting and broaden the impact of cellular health on proteostasis. Though we have falsified multiple hypotheses for how perturbation of lysosomal lipid metabolism could influence protein aggregation, we agree that a major weakness of the current work is our limited mechanistic understanding of this process. We hope that by engaging the thoughtful and creative eLife readership, novel mechanistic hypotheses will emerge.

      Points from reviewer 2 (Public Review):

      This might be too much of an ask, but they should go further in excluding one very attractive alternative model: effects on proteasome activity. This explanation should be addressed definitively because the transcription factor that regulates proteasome subunit gene expression (Nrf1/NFE2L1) is processed in the ER and is therefore well placed to be influenced by membrane conditions, and because it is shown here that proteasome inhibition increase ProteoStat puncta.

      We appreciate the constructive suggestion to examine loss of proteasome expression as a relevant mechanism linking cellular dyslipidemia with proteostasis impairment. We analyzed the genome-wide perturb-seq data from Replogle et al. [1], which was performed in K562 cells cultured under similar conditions to our screen. As expected, perturbation of Nrf1/NFE2L1 reduced expression of proteasome subunits, whereas perturbation of proteasome subunits that increased proteostat staining (e.g. PSMD2, PSMD13) homeostatically increased expression of multiple proteasome subunits. In contrast, other top hits, including those related to lipid-related perturbations (e.g. MYLIP, PSAP) did not reduce the expression of genes encoding the proteasome (Author response image 1).

      Author response image 1.

      The relative expression of genes encoding proteasomal subunits for representative genes was re-plotted from genome-wide perturb-seq data in K562 cells [1]. Shown are hit genes that increase Proteostat staining along with non-targeting controls and the positive control gene NFE2L1. Proteasome expression was induced by proteasome impairment (PSMD2 and PSMD13) and repressed by NFE2L1 knockdown. Other hit genes related to lipid metabolism and lysosome function did not consistently impact the expression of proteasome subunits.

      The authors address proteasome activity only by using a dye that is not referenced. Here a much more solid answer is needed.

      We thank Reviewer #2 for bringing to our attention the missing reference for the proteasome activity probe we used (Me4BodipyFL-Ahx3Leu3VS). Both this probe [2] and its close derivative [3], BodipyFL-Ahx3Leu3VS, were fully characterized previously. We’ll include these references in the revision. In our hands, this probe behaved as expected under MG132 and Bortezomib treatment when quantified by flow cytometry (Fig. 4I), and by in-blot fluorescence scan (data will be included as supplementary in the revision). We further observed that HMGCR KD increased proteasome activity, consistent with what’s suggested by current literature. This validated our use of this probe and strongly suggested that proteasome activity was not perturbed by impaired lipid homeostasis.

      In general, most conclusions in the paper rely essentially solely on ProteoStat assays. The entire study would be greatly strengthened if the authors incorporated biochemical or other modalities to substantiate their results.

      We agree that orthogonal characterization of proteostasis impairment would be valuable. We chose the ProteoStat stain as a reporter of proteostasis because it is capable of integrating the aggregation states of multiple endogenously expressed proteins, and in the absence of exogenous stressors such as the overexpression of aggregation-prone proteins. With aging, a context where ProteoStat staining increases, hundreds of proteins exhibit reduced solubility [4], thus motivating the focus on endogenously expressed proteins. Despite the biochemical limitations, we think our work is differentiated from published screens focused on specific metastable proteins by our focus on regulators of endogenous proteostasis.

      The presentation would be improved greatly if the authors provided diagrams illustrating the pathways implicated in their results, as well as their models.

      We thank Reviewer #2 for the helpful suggestion. We have provided the suggested diagrams below (Author response image 2).

      Author response image 2.

      Mechanistic models linking screen hits to accrual of lysosomal protein aggregates, related to Figure 4. Perturbations that increased cholesterol and sphingolipid levels were evaluated for effects on lysosomal pH, lysosomal proteolytic capacity, lysosomal membrane permeability, lipid peroxidation and proteasome activity. None of these mechanisms appear to play a causal role in protein aggregation in response to elevated lipids.

      Author Response References

      1. Replogle, J. M. et al. Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq. Cell 185, 2559-2575.e28 (2022).

      2. Berkers, C. R. et al. Probing the Specificity and Activity Profiles of the Proteasome Inhibitors Bortezomib and Delanzomib. Mol Pharmaceut 9, 1126–1135 (2012).

      3. Berkers, C. R. et al. Profiling Proteasome Activity in Tissue with Fluorescent Probes. Mol. Pharmaceutics 4, 739–748 (2007).

      4. David, D. C. et al. Widespread Protein Aggregation as an Inherent Part of Aging in C. elegans. Plos Biol 8, e1000450 (2010).

    1. Author Response

      We would like to thank the reviewers for providing constructive feedback on the manuscript. To address the weaknesses identified, we are performing additional experiments and generating additional data, to be added to the updated manuscript.

      (1) The utility of a pipeline depends on the generalization properties.

      While the proposed pipeline seems to work for the data the authors acquired, it is unclear if this pipeline will actually generalize to novel data sets possibly recorded by a different microscope (e.g. different brand), or different imagining conditions (e.g. illumination or different imagining artifacts) or even to different brain regions or animal species, etc.

      The authors provide a 'black-box' approach that might work well for their particular data sets and image acquisition settings but it is left unclear how this pipeline is actually widely applicable to other conditions as such data is not provided.

      In my experience, without well-defined image pre-processing steps and without training on a wide range of image conditions pipelines typically require significant retraining, which in turn requires generating sufficient amounts of training data, partly defying the purpose of the pipeline. It is unclear from the manuscript, how well this pipeline will perform on novel data possibly recorded by a different lab or with a different microscope.

      To address generalizability, we are performing several validation experiments with data from different 1) channels, 2) species (rat), and 3) microscopes, to highlight the robustness of our deep learning (DL) segmentation model to out-of-distribution data with different characteristics and acquisition protocols. We first used our model to segment three images (507x507 x&y, 250-170 um z) from three C57BL/6 mice acquired on the same two-photon fluorescent microscope following the same imaging protocol. The vasculature was labelled with the Texas Red dextran, as in the current experiment. In place of the EYFP signal from pyramidal neurons (2nd channel), gaussian noise was generated with a mean and standard deviation identical to the acquired vascular channel. A second set of two images(507x507 x&y, 300-400 um z) from two Fischer rats with Alexa680-dextran label in the plasma; these rats were imaged on the same two-photon fluorescence microscope, but with galvano scanners (instead of resonant scanners). A second channel of random Gaussian noise was also added here. Finally, an image of vasculature from a ex-vivo cleared mouse brain (1665x1205x780 um) imaged on a light sheet fluorescence microscope (Miltenyi UltraMicroscope Blaze) was also segmented with our model. Lectin-DyLight 649 was used to label the vasculature in this cohort. The Dice Score, Precision, Recall, Hausdorff 95%, and Mean surface distance will be reported for all of these additional image segmentations, upon generation of ground truth images. Finally, examples of the generated segmentation masks are presented in Author response image 1 for visual comparison. Of final note, should the segmentation results on a new data set be unsatisfactory, the methods downstream from segmentation are still applicable and the model can be further fine-tuned on other out-of-distribution data.

      Author response image 1.

      Examples of the deep learning model output on out of distribution data from a different mouse strain, from a different species (Fischer rat), and on a different microscope using a different imaging modality.

      (2) Some of the chosen analysis results seem to not fully match the shown data, or the visualization of the data is hard to interpret in the current form.

      We are updating the visualizations to make them more accessible and we will ensure matching between tables and figures.

      (3) Additionally, some measures seem not fully adapted to the current situation (e.g. the efficiency measure does not consider possible sources or sinks). Thus, some additional analysis work might be required to account for this.

      Thank you for your comment. The efficiency metric was selected as it does not consider sources or sinks. We do agree that accounting for vessel subtypes in the analysis (thus classifying larger vessels as either supplying or draining) would be uniquely useful: notwithstanding, it is extremely laborious. We are therefore leveraging machine learning in a parallel project to afford vessel classification by subtype. The source/sink analysis is also confounded by the small field-of-view of in situ 2PFM. Future work will investigate network remodelling across the whole brain with ex-vivo light sheet fluorescence microscopy.

      (4) The authors apply their method to in vivo data. However, there are some weaknesses in the design that make it hard to accept many of the conclusions and even to see that the method could yield much useful data with this type of application. Primarily, the acquisition of a large volume of tissue is very slow. In order to obtain a network of vascular activity, large volumes are imaged with high resolution. However, the volumes are scanned once every 42 seconds following stimulation. Most vascular responses to neuronal activation have come and gone in 42 seconds so each vessel segment is only being sampled at a single time point in the vascular response. So all of the data on diameter changes are impossible to compare since some vessels are sampled during the initial phase of the vascular response, some during the decay, and many probably after it has already returned to baseline. The authors attempt to overcome this by alternating the direction of the scan (from surface to deep and vice versa). But this only provides two sample points along the vascular response curve and so the problem still remains.

      We thank the Reviewer for bringing up this important point.

      Although vessels can show relatively rapid responses to perturbation, vascular responses to photostimulation of ChannelRhodopsin-2 in neighbouring neurons are typically long lasting: they do not come and go in 42 seconds. To demonstrate this point, we acquired higher temporal-resolution images of smaller volumes of tissue over 5 minutes preceding and following the 5-s photoactivation with the original parameters. Imaging protocol was different in that we utilized a piezoelectric motor, smaller field of view, and only 3x frame averaging, resulting in a temporal resolution of 1.57-2.63 seconds. This acquisition was repeated at 4 different cortical depths (325 um, 250 um, 150um, and 40 um) in a single mouse.The vascular radii were estimated using our presented pipeline. Raw data and LOESS fits are shown in Author response image 2 (below). Vessels shorter than 20 um in length were excluded from the analysis. A video of one of the acquisitions is shown along with the timecourses of select vessels’ caliber changes in Author response image 3. The vascular caliber changes following photostimulation persisted for several minutes, consistent with earlier observations by us and others1–4. These higher temporal-resolution scans of smaller tissue volumes will be repeated in two more mice; we will therein assess the repeatability of individual vessel responses to repeated stimulations.

      Author response image 2.

      A. The vascular radii of multiple vessels were imaged at 4 different cortical depths, each within a 507 x (75-150) x (30-45)um tissue volume. Baseline scanning lasted for 5 minutes, followed by 5 seconds of blue or green light stimulation at 4.3 mW/mm2, and culminating in 5 minutes of post-stimulation scanning. B. LOESS fits of the vessel radius estimates for each vessel segment identified.

      Author response image 3.

      Estimated vascular radius at each timepoint for select vessels from the imaging stack shown in the following video: https://flip.com/s/kB1eTwYzwMJE

      (5) A second problem is the use of optogenetic stimulation to activate the tissue. First, it has been shown that blue light itself can increase blood flow (Rungta et al 2017). The authors note the concern about temperature increases but that is not the same issue. The discussion mentions that non-transgenic mice were used to control for this with "data not shown". This is very important data given these earlier reports that have found such effects and so should be included.

      We will update the manuscript to incorporate the data on volumetric scanning in nontransgenic C57BL/6 mice undergoing blue light stimulation, with identical parameters as those used in Thy-ChR2 mice. As before, responders were identified as vessels that following blue light stimulation show a radius change greater than 2 standard deviations of their baseline radius standard deviation: their estimated radii changes are shown in Author response image 4 below. There were no statistical difference between radii distributions of any of the photostimulation conditions and pre-photostimulation baseline. A comparison of this with the transgenic THY1-ChR2-EYFP mice will be included in manuscript updates.

      Author response image 4.

      Radius change measurements for responding vessels from the Thy1-ChR2 mice described in the manuscript (top row) vs. 4 wild-type C57BL6/J mice (bottom row). Response to photostimulation was defined as a change above twice their baseline standard deviation. 458nm light was applied at 1.1 mW/mm^2 and 4.3 mW/mm^2; while 552 nm light was applied at 4.3 mW/mm^2. No statistically significant differences were observed between the radii distributions in any condition, Wilcoxon test, Bonferroni correction.

      (6) Secondly, there doesn't seem to be any monitoring of neural activity following the photo-stimulation. The authors repeatedly mention "activated" neurons and claim that vessel properties change based on distance from "activated" neurons. But I can't find anything to suggest that they know which neurons were active versus just labeled. Third, the stimulation laser is focused at a single depth plane. Since it is single-photon excitation, there is likely a large volume of activated neurons. But there is no way of knowing the spatial arrangement of neural activity and so again, including this as a factor in the analysis of vascular responses seems unjustified.

      Given the high fidelity of Channel-Rhodpsin2 activation with blue light, we assume that all labeled neurons within the volume of photostimulation are being activated. Depending on their respective connectivities, their postsynaptic neurons (whether or not they are labelled) are also activated. We indeed agree with the reviewer that the spatial distribution of neuronal activation is not well defined. We will revise the manuscript to update the terminology from activated to labeled neurons and stress in the Discussion that the motivation for assessing the distance to the closest labelled neuron as one of our metrics is purely to demonstrate the possibility of linking vascular response to activations in some of their neighbouring neurons and including morphological metrics in the computational pipeline. Of final note, the depth-dependence of the distance between labelled neurons and responding vessels can also readily be assessed using our computational pipeline.

      (7) The study could also benefit from more clear illustration of the quality of the model's output. It is hard to tell from static images of 3-D volumes how accurate the vessel segmentation is. Perhaps some videos going through the volume with the masks overlaid would provide some clarity. Also, a comparison to commercial vessel segmentation programs would be useful in addition to benchmarking to the ground truth manual data.

      We generated a video demonstrating the deep-learning model outputs and have made the video available here: https://flip.com/s/_XBs4yVxisNs Additional videos will be uploaded.

      (8) Another useful metric for the model's success would be the reproducibility of the vessel responses. Seeing such a large number of vessels showing constrictions raises some flags and so showing that the model pulled out the same response from the same vessels across multiple repetitions would make such data easier to accept.

      We have generated a figure demonstrating the repeatability of the vascular responses following photoactivation in a volume, and presented them next to the corresponding raw acquisitions for visual inspection. It is important to note that there is a significant biological variability in vessels’ responses to repeated stimulation, as described previously 2,5. Constrictions have been reported in the literature by our group and others 1,3,4,6,7, though their prevalence has not been systematically studied to date. Concerning the reproducibility of our analysis, we will demonstrate model reproducibility (as a metric of its success) in the updated manuscript.

      Author response image 5.

      Registered acquisitions of the vasculature before and after optogenetic stimulation for 5 scan pairs over 3 different stimulation conditions. The estimated radii along vessel segments are presented.

      Author response image 6.

      Sample capillaries constrictions from maximum intensity projections at repeated timepoints following optogenetic stimulation. Baseline (pre-stimulation) image is shown on the left and the post-stimulation image, on the right, with the estimated radius changes listed to the left.

      (9) A number of findings are questionable, at least in part due to these design properties. There are unrealistically large dilations and constrictions indicated. These are likely due to artifacts of the automated platform. Inspection of these results by eye would help understand what is going on.

      Some of the dilations were indeed large in magnitude. We present select examples of large dilations and constrictions ranging in magnitude from 2.08 to 10.80 um for visual inspection (for reference, average, across vessel and stimuli, magnitude of radius changes were 0.32 +/- 0.54 um). Diameter changes above 5 um were visually inspected.

      Author response image 7.

      Additional views of diameter changes in maximum intensity projections ranging in magnitude from 2.08 um to 10.80 um.

      (10) In Figure 6, there doesn't seem to be much correlation between vessels with large baseline level changes and vessels with large stimulus-evoked changes. It would be expected that large arteries would have a lot of variability in both conditions and veins much less. There is also not much within-vessel consistency. For instance, the third row shows what looks like a surface vessel constricting to stimulation but a branch coming off of it dilating - this seems biologically unrealistic.

      We now plot photostimulation-elicited vesselwise radius changes vs. their corresponding baseline radius standard deviations (Author response image 8 below). The Pearson correlation between the baseline standard deviation and the radius change was 0.08 (p<1e-5) for 552nm 4.3 mW/mm^2 stimulation, -0.08 (p<1e-5) for 458nm 1.1 mW/mm^2 stimulation, and -0.04 (p<1e-5) for 458nm 4.3 mW/mm^2 stimulation. For non-control (i.e. blue) photostimulation conditions, the change in the radius is thus negatively correlated to the vessel’s baseline radius standard deviation. The within-vessel consistency is explicitly evaluated in Figure 8 of the manuscript. As for the instance of a surface vessel constricting while a downstream vessel dilates, it is important to remember that the 2PFM FOV restricts us to imaging a very small portion of the cortical microvascular network (one (among many) daughter vessels showing changes in the opposite direction to the parent vessel is not violating the conservation of mass).

      Author response image 8.

      A plot of the vessel radius change elicited by photostimulation vs. baseline radius standard deviation.

      (11) As mentioned, the large proportion of constricting capillaries is not something found in the literature. Do these happen at a certain time point following the stimulation? Did the same vessel segments show dilation at times and constriction at other times? In fact, the overall proportion of dilators and constrictors is not given. Are they spatially clustered? The assortativity result implies that there is some clustering, and the theory of blood stealing by active tissue from inactive tissue is cited. However, this theory would imply a region where virtually all vessels are dilating and another region away from the active tissue with constrictions. Was anything that dramatic seen?

      The kinetics of the vascular responses are not accessible via the current imaging protocol and acquired data; however, this computational pipeline can readily be adapted to test hypotheses surrounding the temporal evolution of the vascular responses, as shown in Author response image 2 (with higher temporal-resolution data). Some vessels dilate at some time points and constrict at others as shown in Author response image 2. As listed in Table 2, 4.4% of all vessels constrict and 7.5% dilate for 452nm stimulation at 4.3 mW/mm^2. There was no obvious spatial clustering of dilators or constrictors: we expect such spatial patterns to more likely result from different modes of stimulation and/or in the presence of a pathology. The assortativity peaked at 0.4 (i.e. is quite far from 1 where each vessel’s response exactly matches that of its neighbour).

      (12) Why were nearly all vessels > 5um diameter not responding >2SD above baseline? Did they have highly variable baselines or small responses? Usually, bigger vessels respond strongly to local neural activity.

      In Author response image 9, we now present the stimulation-induced radius changes vs. baseline radius variability across vessels with a radius greater than 5 um. The Pearson correlation between the radius change and the baseline radius standard deviation was 0.04 (p=0.5) for 552nm 4.3 mW/mm^2 stimulation, -0.26 (p<1e-5) for 458nm 1.1 mW/mm^2 stimulation, and -0.24 (p<1e-5) for 458nm 4.3 mW/mm^2 stimulation. We will incorporate an additional analysis to address this issue by identifying responding vessels as those showing supra-threshold percent change in their radius (instead of SD).

      Author response image 9.

      A plot of the vessel radius change elicited by photostimulation vs. baseline radius standard deviation in vessels with a baseline radius greater than 5 um.

      References

      (1) Alarcon-Martinez L, Villafranca-Baughman D, Quintero H, et al. Interpericyte tunnelling nanotubes regulate neurovascular coupling. Nature. 2020;kir 2.1(7823):91-95. doi:10.1038/s41586-020-2589-x

      (2) Mester JR, Bazzigaluppi P, Weisspapir I, et al. In vivo neurovascular response to focused photoactivation of Channelrhodopsin-2. NeuroImage. 2019;192:135-144. doi:10.1016/j.neuroimage.2019.01.036

      (3) O’Herron PJ, Hartmann DA, Xie K, Kara P, Shih AY. 3D optogenetic control of arteriole diameter in vivo. Nelson MT, Calabrese RL, Nelson MT, Devor A, Rungta R, eds. eLife. 2022;11:e72802. doi:10.7554/eLife.72802

      (4) Hartmann DA, Berthiaume AA, Grant RI, et al. Brain capillary pericytes exert a substantial but slow influence on blood flow. Nat Neurosci. Published online February 18, 2021:1-13. doi:10.1038/s41593-020-00793-2

      (5) Mester JR, Bazzigaluppi P, Dorr A, et al. Attenuation of tonic inhibition prevents chronic neurovascular impairments in a Thy1-ChR2 mouse model of repeated, mild traumatic brain injury. Theranostics. 2021;11(16):7685-7699. doi:10.7150/thno.60190

      (6) Mester JR, Rozak MW, Dorr A, Goubran M, Sled JG, Stefanovic B. Network response of brain microvasculature to neuronal stimulation. NeuroImage. 2024;287:120512. doi:10.1016/j.neuroimage.2024.120512

      (7) Hall CN, Reynell C, Gesslein B, et al. Capillary pericytes regulate cerebral blood flow in health and disease. Nature. 2014;508(7494):55-60. doi:10.1038/nature13165

    1. Author Response

      We thank both the editors and the Reviewers for their thoughtful comments and recommendations, that will certainly help us improve the manuscript. Below we address in a brief format some of the comments made, and then outline the changes to the manuscript that we plan to implement in the revision.

      We see three interrelated issues in the comments of the Reviewers:

      • the length and complexity of the manuscript;

      • the link to previously proposed formalisms;

      • the impact of adopting the proposed information-theoretic framework.

      With regard to all of these issues, we would first like to highlight that the overall goal of our effort was to integrate con tributions to understanding the mechanisms underlying cognitive control across multiple different disciplines, using the information theoretic framework as a common formalism, while respecting and building on prior efforts as much as possible. Accordingly, we sought to be as explicit as possible about how we bridge from prior work using information theory, as well as neural networks and dynamical systems theory, which contributed to length of the original manuscript. While we continue to consider this an important goal, we will do our best to shorten and clarify the main exposition by reorganizing the manuscript as suggested by Reviewer #1 (i.e., in a way that is similar to what we did in our previous Nature Physics paper on multitasking). Specifically, we will move a substantially greater amount of the bridging material to the Supple mentary Information (SI), including the detailed discussion of the Stroop task, and the description of the link to Koechlin & Summerfield’s [L1] information theory formalism. We will also now include an outline of the full model at the beginning of the manuscript, that includes control and learning, and then more succinctly describe simplifications that focus on specific issues and applications in the remainder of the document.

      Along similar lines, we will revise and harmonize our presentation of the formalism and notations, to make these more consistent, clearer and more concise throughout the document. Again, some of the inconsistencies in notation arose from our initial description of previous work, and in particular that of Koechlin & Summerfield[L1] that was an important inspiration for our work but that used slightly different notations. An important motivation for our introduction of new notation was that their formulation focused on the performance of a single task at a time, whereas a primary goal of our work was to extend the information theoretic treatment to simultaneous performance of multiple tasks. That is, in focusing on single tasks, Koechlin & Summerfield could refer to a task simply as a direct association between stimuli and responses, whereas we required a way of being able to refer to sets of tasks performed at once (”multitasks”), which in turn required specification of internal pathways. Moreover, they do not provide a mechanism to compute the conditional information Q(a|s) of a response/action s conditioned to a stimulus s does not provide a way to compute it explicitly. Our formalism instead provides a way to explicitly unpack this expression in terms of the efficacies –automatic (Eq. 5) or controlled (Eq. 15)– which can also account for the competition between different stimuli {s1, s2, . . . sn}. It also describes explicitly the competition between multiple tasks (Eq. 18, and Eq. 25 for multiple layers), because different ways of processing schemes for the same combinations of stimuli/responses can incur different levels of internal dependencies and thus require different control strategies.

      To mitigate any confusion over terminology we will, as noted above, move a detailed discussion of Koechlin & Summer- field’s formulation, and how it maps to the one we present, to the SI, while taking care to introduce ours clearly at the beginning of the main document, and use it consistently throughout the remainder of the document. We will also make an important distinction – between informational and cognitive costs – more clearly, that we did not do adequately in the original manuscript.

      Finally, to more clearly and concretely convey what we consider to be the most important contributions, we will restrict the number of examples we present to ones that relate most directly to the central points (e.g., the effect and limits of control in the presence of interference, and the differences in control strategy under limited temporal horizons). Accompanying our revision, we will also provide a full point-by-point response to the comments and questions raised by the Reviewers. We summarize some the key points we will address below.

      PRELIMINARY REPLY TO THE REPORT OF REVIEWER #1

      We want to thank the Reviewer for the time and effort put into reviewing our paper and constructive feedback that was provided. We also thank the Reviewer for recognizing the need for a clear computational account of how ”control” manages conflicts by scheduling tasks to be executed in parallel versus serially, and for the positive evaluation on our “efforts of the authors to give these intuitions a more concrete computational grounding.”. As noted in the general reply above, we regret the lack of clarity in several parts of the manuscript and in our introduction and use of the formalism. We consider the following to be the main points to be addressed:

      • the role of task graphs and their mapping to standard neural architectures

      • the description of entropy and related information-theoretic concepts;

      • confusing choice of symbols in our notation between stimuli/responses and serialization/reconfiguration costs;

      • missing definition of response time;

      Regarding the first part point, we acknowledge that the network architectures we focus on do not draw direct inspiration from conventional machine learning models. Instead, our approach is rooted in the longstanding tradition of using (often simpler, but also more readily interpretable) neural network models to address human cognitive function and how this may be implemented in the brain [L2]; and, in particular, the mechanisms underlying cognitive control (e.g., [L3, L4]). In this context, we emphasize that, for analytical clarity, we deliberately abstract away from many biological details, in an effort to identify those principles of function that are most relevant to cognitive function. Nevertheless, our network architecture is inspired by two concepts that are central to neurobiological mechanisms of control: inhibition and gain modulation. Specifi- cally, we incorporate mutual inhibition among neural processing units, a feature represented by the parameter β. This aspect of our model is consistent with biologically inspired frameworks of neural processing, such as those discussed by Munakata et al. (2011)[L5], reflecting the competitive dynamics observed in neural circuits. Moreover, we introduce the parameter ν to represent a strictly modulatory form of control, akin to the role of neuromodulators in the brain. This modulatory control adjusts the sensitivity of a node to differences among its inputs (e.g., Servan-Schreiber, Printz, & Cohen, (1990)[L6]; Aston-Jones & Cohen (2005)[L7]). Finally, as the Reviewer notes, additional hidden layers can improve expressivity in neural networks, enabling the efficient implementation of more complex tasks, and are a universal feature of biological and artificial neural systems. We thus examined multitasking capability under the assumption that multiple hidden layers are present in a network; irrespective of whether they are needed to implement the corresponding tasks.

      Regarding the second point, as noted above, we believe that the confusion arose from our review of the work by Koechlin & Summerfield. In their formalism, in which an action a is chosen (from a set of potential actions) with probability p(a), the cost of choosing that action is − log p(a). This is usually referred to as the information content or, alternatively, the localized entropy [L8]. As the Reviewer correctly observed, the canonical (Shannon) entropy is actually the expectation lEa[− log p(a)] over the localized entropies of a set of actions. In summarizing their formulation, we misleadingly stated that ”they used standard Shannon entropy formalism as a measure of the information required to select the action a.” We will now correct this to state: “[..] they used local entropy (− log p(a)) as a measure of the information required to select the action a, that can be treated as the cost of choosing that action.” We follow this formulation in our own, referring to informational cost as Ψ, and generalizing this to include cases in which more than one action may be chosen to perform at a time.

      Regarding the third point, the confusion is due to our use of the letters S and R for both the stimulus and response units (in Sec. II.B) and then serialization and reconstruction costs (in eqs 31-33). We will fix this by renaming the serialization and reconstruction costs more explicitly as S er and Rec.

      Finally, we realized we never explicitly stated the expression of the response time we used, but only pointed to it in the literature. In the manuscript we used the expression given in Eq. 53 of [L9], which provides response times as function of the error rates ER and the number of options .

      PRELIMINARY REPLY TO THE REPORT OF REVIEWER #2

      We want to thank the Reviewer for recognizing our effort to ”rigorously synthesize ideas about multi-tasking within an information-theoretic framework” and its potential. We also thank the Reviewer for the careful comments.

      To our best understanding, and similarly to Reviewer #1, the main comments of the Reviewer are on:

      • the length and density of the paper;

      • the presentation of the Koechlin & Summerfield’s formalism, and the mismatch/lack of clarity of ours in certain points;

      • the added value of the information theoretic formalism.

      Regarding the first two points, which are common to Reviewer #1, we plan to move a significant part of the manuscript to the Supplementary Information, both to improve readability and make the manuscript shorter, as well as to provide one consistent and cleaner formalism (in particular with regards to the typos and errors highlighted by the Reviewer). In par- ticular, with respect to the comment on Eq. 4-5-6, we will clarify that the probability p[ fi j] is the probability that a certain input dimension (i in this case) is selected by on node j to produce its response (averaged over the individual inputs in each input dimension). We will also take care to make sure that the definition and domain of the various probabilities and probability distributions we use are clearly delineated (e.g. where the costs computed for tasks and task pathways come from).

      Regarding the third point, we hope that our work offers value in at least two ways: i) it helps bring unity to ideas and descriptions about the capacity constraints associated with cognitive control that have previously been articulated in different forms (viz., neural networks, dynamical systems, and statistical mechanical accounts); and ii) doing so within an information theoretic framework not only lends rigor and precision to the formulation, but also allows us to cast the allocation of control in normative form – that is, as an optimization problem in which the agent seeks to minimize costs while maximizing gains. While we do not address specific empirical phenomena or datasets in the present treatment, we have done our best to provide examples showing that: a) our information theoretic formulation aligns with treatments using other formalisms that have been used to address empirical phenomena (e.g., with neural network models of the Stroop task); and b) our formulation can be used as a framework for providing a normative approach to widely studied empirical phenomena (e.g., the transition from control-dependent to automatic processing during skill acquisition) that, to date, have been addressed largely from a descriptive perspective; and that it can provide a formally rigorous approach to addressing such phenomena.

      [L1] E. Koechlin and C. Summerfield, Trends in cognitive sciences 11, 229 (2007).

      [L2] J. L. McClelland, D. E. Rumelhart, P. R. Group, et al., Explorations in the Microstructure of Cognition 2, 216 (1986).

      [L3] J. D. Cohen, K. Dunbar, and J. L. McClelland, Psychological Review 97, 332 (1990).

      [L4] E. K. Miller and J. D. Cohen, Annual review of neuroscience 24, 167 (2001).

      [L5] Y. Munakata, S. A. Herd, C. H. Chatham, B. E. Depue, M. T. Banich, and R. C. O’Reilly, Trends in cognitive sciences 15, 453 (2011).

      [L6] D. Servan-Schreiber, H. Printz, and J. D. Cohen, Science 249, 892 (1990).

      [L7] G. Aston-Jones and J. D. Cohen, Annu. Rev. Neurosci. 28, 403 (2005).

      [L8] T. F. Varley, Plos one 19, e0297128 (2024).

      [L9] T. McMillen and P. Holmes, Journal of Mathematical Psychology 50, 30 (2006).

    1. Author response:

      We would like to thank the three reviewers for the careful review and thoughtful comments on our manuscript. In addition to providing useful suggestions, they uncovered some embarrassing oversights on our part, related to experimental details including number of embryos, and quantification of variance in the observed changes for some of the experiments, which were inadvertently omitted in the submission. We provide below an initial response to the reviewer’s public reviews and expect to submit a revised manuscript comprehensively addressing all their concerns.

      I would like to start by addressing some of their most critical comments related to validation of the tools used to reduce soxB1 gene family function in the embryo.  In the absence of the critical supplementary data that we inadvertently failed to include, the reviewers were left with an understandable, but we feel erroneous impression, that there was insufficient validation of mutant and knockdown tools. 

      Reviewer #2 says “The sox2y589 mutant line is not properly verified in this manuscript, which could be done by examining ant-Sox2 antibody labeling, Western blot analysis or…”

      This validation, which had been performed previously both with antibody staining and with western blot analysis, was inadvertently omitted from the supplementary data submitted with the paper. The western blot data is shown here.

      Author response image 1.

      Validation of sox2 mutant phenotype with Western blot.

      Lysates were prepared from 25 embryos selected as wild type or potentially mutant based on the “loss of L1” phenotype at 6 dpf. This polyclonal antibody recognizes within the last 16 amino acids of the C-terminal.

      Author response image 2.

      Validation of sox2 mutant phenotype with antibody staining.

      Though in this experiment there was considerable background in the red channel, and it shows the lateral line nerve, loss of nuclear Sox2 expression is evident in the deposited neuromast of an embryo identified as a mutant based on its delayed deposition of the L1 neuromast.

      This data and a repeat of the antibody staining showing the primordium with loss of Sox2 will be included in a revised manuscript.

      Furthermore, Reviewer #2 comments “the authors show that the anti-Sox2 and antiSox3 antibody labeling is reduced but not absent in sox2 MO1 and sox3 MO-injected embryos, but do not show antibody labeling of the sox2 MO and sox3 MO-double injected embryos to determine if there is an additional knockdown”

      This will be included in a revised manuscript.

      Reviewer #2:

      The authors acknowledge that the sox2 MO1 used in this manuscript also alters sox3 function, but do not redo the experiments with a specific sox2 MO

      This is not exactly true. Having discovered sox2 MO1 simultaneously reduces sox2 and sox3 function, three new morpholinos were obtained based on another paper (Kamachi et al 2008), which had quantitatively assessed efficacy of three sox2 specific morpholinos (sox2 MO2, sox2 MO3, and sox2 MO4). The effects of these morpholinos on the pattern of L1 deposition was compared to that of sox2 MO1. This comparison was shown in supplementary Figure 2 and is included below. It shows that the sox2 specific morpholinos resulted in a poorly penetrant delay in deposition of L1, comparable to that of a sox2 mutant, which was quantified in supplementary Figure 3B. The observations with these three sox2 specific morpholinos independently supported the observations made with the sox2 mutant that reduction of sox2 on its own results in a delay in deposition of the first neuromast with low penetrance and that to effectively examine the role of these SoxB1 genes in the primordium their function needs to be compromised in a combinatorial manner. A conclusion that was independently supported by observations made by crossing sox1a, sox2 and sox3 mutants (Figure 3 and Supplementary Figure 3). Therefore, even though the initial use of a sox2 morpholino, which simultaneously knocks down sox3, was unintentional, its use turned out to be useful. It allowed us to examine effects of knocking down sox2 and sox3 with a single morpholino. Furthermore, though this project was initiated more than 15 years ago to specifically understand sox2 function, our focus had shifted to understanding the role of soxB1 family members sox1a, sox2 and sox3 functioning together as an interacting system that regulates Wnt activity in the primordium. Considering this broader focus, reflected in the title of the paper, it was not a priority to repeat every experiment previously done with the sox2MO1 with the new sox2 specific morpholinos. Instead, having acknowledged the “limitations” of sox2MO1, we used it to better understand effects of combinatorial reduction of SoxB1 function.

      Reviewer #1:

      It is not exactly clear what underlies the apparent redundancy. It would be helpful if the soxb gene family member expression was reported after loss of each.

      As suggested by reviewer #1, we had previously looked changes in expression of each of the soxB1 factors following loss of individual soxB1 factors but not included it in the supplementary data with the original submission. Independent of a reproducible and consistent expansion sox1a expression into the trailing zone, following loss of sox2 function, which is reported in the paper and quantified here where 10/10 mutant embryos showed the expansion (compare region within bracket in WT and sox2<sup>-/-</sup>), no consistent changes in the expression of other soxB1 family members was observed as part of a mechanism that might account for compensation when function of a particular soxB1 factor is soxB1 factor is lost. The data shown above together with more extensive quantification of changes will be included in a revised version of the manuscript. At this time the only consistent change was the expansion of sox1a to the trailing zone when lost. The data trailing zone when sox2 function is lost. This change reflects dependence of sox1a on Wnt activity and the fact that Wnt activity expands into the trailing zone when sox2 function is lost.  

      Author response image 3.

      Reviewer #3:

      Given that the expression patterns of Sox1a and Sox3 are not merely different but are largely reciprocal, the mechanistic basis of their very similar double mutant phenotypes with Sox2 remains opaque.

      The simplest way to think about compensation for gene function in a network is to think of it being determined by expression of a homolog or another gene with a similar function being expressed in a similar or overlapping domain.  However, it is more useful to think of Sox2 function in the primordium as part of a interacting network of SoxB1 factors whose differential regulatory mechanisms create a robust system that simultaneously regulates two key aspects of Wnt activity in the primordium; how high Wnt activity is allowed to get in the leading zone and how effectively it is shut off to facilitate protoneuromast maturation in the trailing zone. These features of Wnt activity influence both when and where nascent protoneuromasts will form in the wake of a progressively shrinking Wnt system and where they undergo effective maturation and stabilization prior to deposition. Changes in individual SoxB1 expression patterns provide some hints about how some SoxB1 factors may compensate when function of one or more of these factors is compromised. However, a deeper understanding of robustness and “compensation” will require a systems level understanding of this gene regulatory network with computational models, which we are currently working on in our group. It remains possible, for example, that how far into the trailing zone the Wnt activity has an influence is regulated at least in part by how high it is allowed to get in the leading zone by sox1a. Conversely, how high Wnt activity gets in the leading zone may be influenced by how effectively it is shut off in the trailing zone by sox2 and sox3, as this influences the size of the Wnt system, which in turn can influence the overall level of Wnt activity. In this manner Sox1a may cooperate with Sox2 and Sox3 to limit both how high Wnt activity is allowed to get in the primordium and to effectively shut it off in the trailing zone.

      Reviewer #3:

      Related to this, the authors discuss that Sox1a/Sox2 double knockdown produces a more severe phenotype than Sox2/Sox3 double knockdown, yet this difference is not obviously reflected in the data.

      The severity of the sox1a/sox2 double mutant phenotype compared to that of the sox2/sox3 double mutant is shown in Figure 3 K and N, and quantified in Supplementary Figure 3A. Simultaneous loss of sox2 and sox3 results in a small but relatively penetrant delay in where the first stable neuromast is deposited (Figure 2 N). By contrast, loss of sox2 and sox1a together consistently results in a longer delay in deposition of the first stable (Figure 2 K). A new graph, shown below, which will be incorporated in the revised paper, shows that there is a significant difference in the pattern of L1 deposition in sox1a<sup>-/-</sup>, sox2<sup>-/-</sup> and sox2<sup>-/-</sup>, sox3<sup>-/-</sup> double mutants. 

      Author response image 4.

      All 3 datasets found to be normally distributed by Shapiro-Wilk test. 1-way ANOVA showed significance (<0.0001), with Tukey’s multiple comparisons test showing significant difference between all 3 conditions. (***p=0.0008, ****p<0.0001)

      Reviewer #1:

      It would be good to more clearly state why sox3 is not regulated by Wnt given its expression is inhibited by the delta TCF construct (Figure 2M).

      The explanation for why we believe sox3 expression is determined by Fgf signaling, and not Wnt activity requires integrating what is observed both with induction of the delta TCF construct and the dominant negative Fgf receptor (DN FgfR). Loss of sox3 expression with induced expression of the delta TCF construct could result from loss of Wnt activity or the downstream loss of Fgf activity, which is ultimately dependent on Fgfs secreted by Wnt active cells in the leading domain. Distinguishing between these possibilities is based on inhibition of FGF signaling with the DN FgfR, described in the next paragraph. Heat Shock induced expression of DN FgfR expression results in loss of FGF signaling and the simultaneous expansion of Wnt activity into the trailing zone. As explained in the original text, loss of sox3 expression in this context, rather than its expansion, suggests its expression is determined by Fgf signaling not Wnt activity. We will emphasize that its loss, rather than its expansion, following induction of DN FgfR, indicates its expression is determined by Fgf signaling not Wnt activity.

      Reviewer #2:

      The manuscript lacks quantification of many of the experiments, making it difficult to conclude their significance.

      One of the biggest inadvertent omissions of the paper was the inadequate quantification of some of the results. Quantification of results with considerable variation in the outcome, like the pattern of L1 deposition,  was provided following manipulations where various combinations of sox1a, sox2, and sox3 function was lost (Figures 3, supplementary Figures 2 and 3) or where sox2MO1/sox3MO was used with or without IWR (Figure 5 and Figure 6). However, numbers for the experiments in Figures 2 were omitted in the Figure legend, where typically about 10 embryos for each manipulation were photographed, scored, and a representative image was used to make the figure. In these experiments  there was a very consistent result with 100% of the embryos showing changes represented by each panel in Figure 2. The only exception was Figure 2Y where 9/10 embryos showed the described change. Similarly in Figure 4 there was a consistent result and 100% of embryos showed the change shown. Numbers and statistics for these results will be included in a revised manuscript.

      Reviewer #2:

      The statistical analysis in Figure 5 and Supplementary Figures 2 and 3 should be one-way ANOVA or Kruskal-Wallis with a Dunn's multiple comparisons test rather than pair-wise comparisons.

      The analysis has been re-done following the reviewer’s suggestions. The analysis confirms the primary conclusions of the original submission, and this analysis will be incorporated in a revised manuscript. However, to improve the power of the analysis, experiments with low numbers of embryos will be repeated.

      See redone graphs in Figure 5 and supplementary Figure 2 and 3.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their manuscript entitled 'The domesticated transposon protein L1TD1 associates with its ancestor L1 ORF1p to promote LINE-1 retrotransposition', Kavaklıoğlu and colleagues delve into the role of L1TD1, an RNA binding protein (RBP) derived from a LINE1 transposon. L1TD1 proves crucial for maintaining pluripotency in embryonic stem cells and is linked to cancer progression in germ cell tumors, yet its precise molecular function remains elusive. Here, the authors uncover an intriguing interaction between L1TD1 and its ancestral LINE-1 retrotransposon.

      The authors delete the DNA methyltransferase DNMT1 in a haploid human cell line (HAP1), inducing widespread DNA hypo-methylation. This hypomethylation prompts abnormal expression of L1TD1. To scrutinize L1TD1's function in a DNMT1 knock-out setting, the authors create DNMT1/L1TD1 double knock-out cell lines (DKO). Curiously, while the loss of global DNA methylation doesn't impede proliferation, additional depletion of L1TD1 leads to DNA damage and apoptosis.

      To unravel the molecular mechanism underpinning L1TD1's protective role in the absence of DNA methylation, the authors dissect L1TD1 complexes in terms of protein and RNA composition. They unveil an association with the LINE-1 transposon protein L1-ORF1 and LINE-1 transcripts, among others.

      Surprisingly, the authors note fewer LINE-1 retro-transposition events in DKO cells than in DNMT1 KO alone.

      Strengths:

      The authors present compelling data suggesting the interplay of a transposon-derived human RNA binding protein with its ancestral transposable element. Their findings spur interesting questions for cancer types, where LINE1 and L1TD1 are aberrantly expressed.

      Weaknesses:

      Suggestions for refinement:

      The initial experiment, inducing global hypo-methylation by eliminating DNMT1 in HAP1 cells, is intriguing and warrants a more detailed description. How many genes experience misregulation or aberrant expression? What phenotypic changes occur in these cells?

      The transcriptome analysis of DNMT1 KO cells showed hundreds of deregulated genes upon DNMT1 ablation. As expected, the majority were up-regulated and gene ontology analysis revealed that among the strongest up-regulated genes were gene clusters with functions in “regulation of transcription from RNA polymerase II promoter” and “cell differentiation” and genes encoding proteins with KRAB domains. In addition, the de novo methyltransferases DNMT3A and DNMT3B were up-regulated in DNMT1 KO cells suggesting the set-up of compensatory mechanisms in these cells. We will include this data set in the revised version of the manuscript.

      Why did the authors focus on L1TD1? Providing some of this data would be helpful to understand the rationale behind the thorough analysis of L1TD1.

      We have previously discovered that conditional deletion of the maintenance DNA methyltransferase DNMT1 in the murine epidermis results not only in the up-regulation of mobile elements, such as IAPs but also the induced expression of L1TD1 ((Beck et al, 2021), Suppl. Table 1 and Author response image 1). Similary, L1TD1 expression was induced by treatment of primary human keratinocytes or squamous cell carcinoma cells with the DNMT inhibitor aza-deoxycytidine (Author response image 2 and 3). These finding are in accordance with the observation that inhibition of DNA methyltransferase activity by azadeoxycytidine in human non-small cell lung cancer cells (NSCLCs) results in upregulation of L1TD1 (Altenberger et al, 2017). Our interest in L1TD1 was further fueled by reports on a potential function of L1TD1 as prognostic tumor marker. We will include this information in the revised manuscript.

      Author response image 1.

      RT-qPCR of L1TD1 expression in cultured murine control and Dnmt1 Δ/Δker keratinocytes. mRNA levels of L1td1 were analyzed in keratinocytes isolated at P5 from conditional Dnmt1 knockout mice (Beck et al., 2021). Hprt expression was used for normalization of mRNA levels and wildtype control was set to 1. Data represent means ±s.d. with n=4. **P < 0.01 (paired t-test).

      Author response image 2.

      RT-qPCR analysis of L1TD1 expression in primary human keratinocytes. Cells were treated with 5-aza-2-deoxycidine for 24 hours or 48 hours, with PBS for 48 hours or were left untreated. 18S rRNA expression was used for normalization of mRNA levels and PBS control was set to 1. Data represent means ±s.d. with n=3. **P < 0.01 (paired t-test).

      Author response image 3.

      Induced L1TD1 expression upon DNMT inhibition in squamous cell carcinoma cell lines SCC9 and SCCO12. Cells were treated with 5-aza-2-deoxycidine for 24 hours, 48 hours or 6 days. (A) Western blot analysis of L1TD1 protein levels using beta-actin as loading control. (B) Indirect immunofluorescence microscopy analysis of L1TD1 expression in SCC9 cells. Nuclear DNA was stained with DAPI. Scale bar: 10 µm. (C) RT-qPCR analysis of L1TD1 expression in primary human keratinocytes. Cells were treated with 5-aza-2deoxycidine for 24 hours or 48 hours, with PBS for 48 hours or were left untreated. 18S rRNA expression was used for normalization of mRNA levels and PBS control was set to 1. Data represent means ±s.d. with n=3. P < 0.05, *P < 0.01 (paired t-test).

      The finding that L1TD1/DNMT1 DKO cells exhibit increased apoptosis and DNA damage but decreased L1 retro-transposition is unexpected. Considering the DNA damage associated with retro-transposition and the DNA damage and apoptosis observed in L1TD1/DNMT1 DKO cells, one would anticipate the opposite outcome. Could it be that the observation of fewer transposition-positive colonies stems from the demise of the most transposition-positive colonies? Further exploration of this phenomenon would be intriguing.

      This is an important point and we were aware of this potential problem. Therefore, we calibrated the retrotransposition assay by transfection with a blasticidin resistance gene vector to take into account potential differences in cell viability and blasticidin sensitivity. Thus, the observed reduction in L1 retrotransposition efficiency is not an indirect effect of reduced cell viability.

      Based on previous studies with hESCs, it is likely that, in addition to its role in retrotransposition, L1TD1 has additional functions in the regulation of cell proliferation and differentiation. L1TD1 might therefore attenuate the effect of DNMT1 loss in KO cells generating an intermediate phenotype (as pointed out by Reviewer 2) and simultaneous loss of both L1TD1 and DNMT1 results in more pronounced effects on cell viability.

      Reviewer #2 (Public Review):

      In this study, Kavaklıoğlu et al. investigated and presented evidence for the role of domesticated transposon protein L1TD1 in enabling its ancestral relative, L1 ORF1p, to retrotranspose in HAP1 human tumor cells. The authors provided insight into the molecular function of L1TD1 and shed some clarifying light on previous studies that showed somewhat contradictory outcomes surrounding L1TD1 expression. Here, L1TD1 expression was correlated with L1 activation in a hypomethylation-dependent manner, due to DNMT1 deletion in the HAP1 cell line. The authors then identified L1TD1-associated RNAs using RIP-Seq, which displays a disconnect between transcript and protein abundance (via Tandem Mass Tag multiplex mass spectrometry analysis). The one exception was for L1TD1 itself, which is consistent with a model in which the RNA transcripts associated with L1TD1 are not directly regulated at the translation level. Instead, the authors found the L1TD1 protein associated with L1-RNPs, and this interaction is associated with increased L1 retrotransposition, at least in the contexts of HAP1 cells. Overall, these results support a model in which L1TD1 is restrained by DNA methylation, but in the absence of this repressive mark, L1TD1 is expressed and collaborates with L1 ORF1p (either directly or through interaction with L1 RNA, which remains unclear based on current results), leads to enhances L1 retrotransposition. These results establish the feasibility of this relationship existing in vivo in either development, disease, or both.

    1. Author Response

      We are grateful for the constructive comments of the reviewers and for the succinct assessment of our work by the editors. Here we provide a brief summary of our response to answer the major criticism of our reviewers. We will give a detailed point-to-point response soon when we upload a revision of our paper.

      1) The MATLAB code for the spatial autocorrelation analysis is now freely available at the following site: : https://github.com/dcsabaCD225/Moran_Matlab/blob/main/moran_local.m If any question arises during its implementation, please contact Csaba Dávid (david.csaba@koki.hu)

      2) Concerning the computer resources and times required to perform Moran’s I image analysis, here we provide a brief description of the hardware and the calculations for images with different sizes.

      Hardware used for performing the analysis:

      Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz, 2594 Mhz, 4 kernel CPU, 64GB RAM, NVIDIA GeForce GTX 1080 graphic card.

      MATLAB R2021b software was used for implementation.

      Computation times are shown in Author response table 1.

      Author response table 1.

      3) In response to the comment:

      “While the method's avoidance of AI training appeals to those lacking computational know-how and shows improved accuracy over basic threshold-based techniques, there are valid concerns regarding its performance in comparison to advanced methodologies”.

      Comparison of Moran’s I image analysis with AI based segmentations raises conceptual problems which will be addressed in detail in the revised version. Briefly, the basis of AI based analyses is that the ground truth is known and using a large teaching set AI learns to extract the relevant information for image segmentation. In several cases, however (like protein distribution in the membrane) the ground truth is not known and cannot be easily determined by any single observer. Defining spatial inhomogeneities in protein distribution, differentiating proteins involved vs not involved in clusters is highly subjective. Indeed, our analysis showed the 23 expert human observers varied hugely in establishing the boundaries of a protein cluster. As a consequence, establishing and using a teaching set would be highly contentious in these cases. In an average laboratory setting generating a teaching set using hundreds of images examined by two dozen people would not be impossible but not really plausible. The beauty of Moran’n I analysis is that it is able to extract the relevant signals from an image generated in different, often noisy condition using a simple algorithm that allows quantitative characterization and identification of changes in many biological and non-biological samples.

    1. Author response:

      We deeply appreciate the editors’ and reviewers’ invaluable time and effort. We would also like to extend our gratitude to eLife for its unwavering commitment to a transparent review and publication model. Below, we present our point-by-point responses to the comments.  

      Besides the WT allele, equivalent to the mouse TMEM173 gene, the human TMEM173 gene has two common alleles: the HAQ and AQ alleles carried by billions of people. The main conclusions and interpretation, summarized in the Title and Abstract, are (i) Different from the WT TMEM173 allele, the HAQ or AQ alleles are resistant to STING activation-induced cell death; (ii) STING residue 293 is critical for cell death; (iii) HAQ, AQ alleles are dominant to the SAVI allele; iv) One copy of the AQ allele rescues the SAVI disease in mice. We propose that STING research and STING-targeting immunotherapy should consider human TMEM173 heterogeneity. These interpretations and conclusions were based on Data and Logic. We welcome alternative, logical interpretations from our peers and potential collaborations to advance the human TMEM173 research.  

      Reviewer #1 (Public Review):

      Responses to Comment 1: We greatly appreciate Reviewer 1's insights. We will change the “lymphocytes” to “splenocytes” (line 134) as suggested. We respectfully disagree with Reviewer 1’s comments on TBK1 (lines 129 – 134). First, we used two different TBK1 inhibitors: BX795 and GSK8612. Second, because BX795 also inhibits PDK1, we used a PDK1 inhibitor GSK2334470; Third, both BX795 and GSK8612 completely inhibited diABZI-induced splenocyte cell death (Figure 1B). The logical conclusion is “TBK1 activation is required for STING-mediated mouse spleen cell death ex vivo”. (line 118). 

      This manuscript uncovers a significant aspect of the interplay between the common human TMEM173 alleles and the rare SAVI mutation (lines 23-26). Our discovery that the common human TMEM173 alleles are resistant to STING activation-induced cell death is a substantial finding. It further strengthens the argument that the HAQ and AQ alleles are functionally distinct from the WT allele 1-3. We wish to underscore the crucial message of this study-that 'STING research and STING-targeting immunotherapy should consider TMEM173 heterogeneity in humans' (line 37), which has been largely overlooked in current STING clinical trials 4.  

      Regarding STING-Cell death, as we stated in the Introduction (lines 62-79). (i) STING-mediated cell death is cell type-dependent 5-7 and type I IFNs-independent 5,7,8. (ii) The in vivo biological significance of STING-mediated cell death is not clear 7,8. (iii) The mechanisms of STING-Cell death remain controversial. Multiple cell death pathways, i.e., apoptosis, necroptosis, pyroptosis, ferroptosis, and PANoptosis, are proposed 7,9,10. SAVI patients (WT/SAVI) and mouse models had CD4 T cellpenia 8,11. SAVI/HAQ, SAVI/AQ restored T cells in mice. Thus, the manuscript provides some answers to the biological significance of STING-cell death. Next, splenocytes from Q293/Q293 mice are resistant to STING cell death. The logical conclusion is that the amino acid 293 is critical for STING cell death. How aa293 mediates this function needs future investigation. Similarly, how TBK1 mediates STING cell death, independent of type I IFNs and NFκB induction, needs future investigation.

      Responses to Comment 2: These are all very interesting questions that we will address in future studies. This manuscript, titled “The common TMEM173 HAQ, AQ alleles rescue CD4 T cellpenia, restore T-regs, and prevent SAVI (N153S) inflammatory disease in mice” does not focus on Q293 mice. We have been researching the common human TMEM173 alleles since 2011 from the discovery12 , mouse model1,3, human clinical trial2, and human genetics studies 3. This manuscript is another step towards understanding these common human TMEM173 alleles with the new discovery that HAQ, AQ are resistant to STING cell death. 

      Responses to Comment 3: We aim to address these worthy questions in future studies. In this manuscript, Figure 6 shows AQ/SAVI had more T-regs than HAQ/SAVI (lines 246 – 256). In our previous publication on HAQ, AQ knockin mice, we showed that AQ T-regs have more IL-10 and mitochondria activity than HAQ T-regs 3. We propose that increased IL-10+

      Tregs in AQ mice may contribute to an improved phenotype in AQ/SAVI compared to

      HAQ/SAVI. However, we are not excluding other contributions (e.g. metabolic difference) by the AQ allele. We will explore these possibilities in future research.   

      Responses to Comment 4: Figure 2 is necessary because it reveals the difference between mouse and human STING cell death. Figure 2A-2B showed that STING activation killed human CD4 T cells, but not human CD8 T cells or B cells. This observation is different from Figure 1A, where STING activation killed mouse CD4, CD8 T cells, and CD19 B cells, revealing the species-specific STING cell death responses. Regarding human CD8 T cells, as we stated in the Discussion (lines 318-320), human CD8 T cells (PBMC) are not as susceptible as the CD4 T cells to STING-induced cell death 8. We used lung lymphocytes that showed similar observations (Figure 2A). For Figure 2C, we used 2 WT/HAQ and 3 WT/WT individuals (lines 738-739). We generate HAQ, AQ THP-1 cells in STING-KO THP-1 cells (Invivogen,, cat no. thpd-kostg) (lines 740-741). 

      A recent study found that STING agonist SHR1032 induces cell death in STING-KO THP-1 cells expressing WT(R232) human STING 10 (line 182) independent of type I IFNs. SHR1032 suppressed THP1-STING-WT(R232) cell growth at GI50: 23 nM while in the parental THP1STING-HAQ cells, the GI50 of SHR1032 was >103 nM 10. Cytarabine was used as an internal control where SHR1032 killed more robustly than cytarabine in the THP1-STING-WT(R232) cells but much less efficiently than cytarabine in the THP-1-STING-HAQ cells 10.   

      This manuscript rigorously uses mouse splenocytes, human lung lymphocytes, THP-1 reconstituted with HAQ, AQ, and HAQ/SAVI, AQ/SAVI mice, to demonstrate that the common human HAQ, AQ alleles are resistant to STING cell death in vitro and in vivo.

      We agree with reviewer 1 that STING-mediated cell death mechanisms in myeloid and lymphoid cells may be different and likely contribute to the different mechanisms proposed in STING cell death research 7,9,10. Our study focuses on the in vivo mechanism of T cellpenia.  

      Responses to Comment 5: We stated in the Introduction that “AQ responds to CDNs and produce type I IFNs in vivo and in vitro 3,13,14 ”(line 94, 95). We reported that the AQ knock in mice responded to STING activation 3. We previously showed that there was a negative natural selection on the AQ allele in individuals outside of Africa 3. 28% of Africans are WT/AQ but only 0.6% East Asians are WT/AQ 3. Future research on the AQ allele will address this interesting question that may shed new mechanistic light on STING action.

      Responses to Comment 6: The comment here is similar to comment 3. In this manuscript, Figure 6 shows AQ/SAVI had more T-regs than HAQ/SAVI (lines 246 – 256). In our previous publication on HAQ, AQ knockin mice, we showed that AQ T-regs have more IL-10 and mitochondria activity than HAQ T-regs 3. We propose that increased IL-10+ Tregs in AQ mice may contribute to an improved phenotype in AQ/SAVI compared to HAQ/SAVI. However, we are not excluding other contributions (e.g. metabolic difference) by the AQ allele.

      Responses to Comment 7: Both radioresistant parenchymal and/or stromal cells and hematopoietic cells influence SAVI pathology in mice 15,16. Nevertheless, the lack of CD 4 T cells, including the anti-inflammatory T-regs, likely contributes to the inflammation in SAVI mice and patients. We characterized lung function, lung inflammation (Figure 4), lung neutrophils, and inflammatory monocyte infiltration (Figure S4). 

      Responses to Comment 8: Several publications have linked STING to HIV pathogenesis 17-22  (line 271). The manuscript studies STING activation-induced cell death. It is not stretching to ask, for example, does preventing STING cell death, without affecting type I IFNs production, restore CD4 T cell counts and improve care for AIDS patients?

      Reviewer #2 (Public Review):

      Response to Comment 1: Please see the Figure below for cell death by diABZI, DMXAA in Splenocytes from WT/WT, WT/HAQ, HAQ/SAVI, AQ/SAVI mice. The HAQ/SAVI and AQ/SAVI splenocytes showed similar partial resistance to STING activationinduced cell death. 

      Responses to Comment 2: We examined HAQ, AQ mouse splenocytes, HAQ human lung lymphocytes, THP-1 reconstituted with HAQ, AQ, and HAQ/SAVI, AQ/SAVI mice, to demonstrate that the common human HAQ, AQ alleles are resistant to STING cell death in vitro and in vivo. Additional human T cell line work does not add too much. 

      Responses to Comment 3: This is possibly a misunderstanding. We use BMDM for the purpose of comparing STING signaling (TBK1, IRF3, NFκB, STING activation) by WT/SAVI, HAQ/SAVI, AQ/SAVI. Ideally, we would like to compare STING signaling in CD4 T cells from WT/SAVI to HAQ/SAVI, AQ/SAVI mice. However, WT/SAVI has no CD4 T cells. Here, we are making the assumption that the basic STING signaling (TBK1, IRF3, NFκB, STING activation) is conserved between T cells and macrophages. 

      Responses to Comment 4: Reviewer 2 suggests looking for evidence of inflammation and STING activation in the lungs of HAQ/SAVI, AQ/SAVI. We would like to elaborate further. First, anti-inflammatory treatments, e.g. steroids, DMARDs, IVIG, Etanercept, rituximab, Nifedipine, amlodipine, et al., all failed in SAVI patients 11. Second, Figure S4 examined lung neutrophils and inflammatory monocyte infiltration. Interestingly, while AQ/SAVI mice had a better lung function than HAQ/SAVI mice (Figure 4D, 4E vs 4H, 4I), HAQ/SAVI and AQ/SAVI lungs had comparable neutrophils and inflammatory monocyte infiltration. Last, SAVI is classified as type I interferonopathy 11, but the lung diseases of SAVI are mainly independent of type I IFNs 23-26. The AQ allele suppresses SAVI in vivo.  Understanding the mechanisms by which AQ rescues SAVI can generate curative care for SAVI patients.  

      Author response image 1.

      (A-B). Flow cytometry of HAQ/SAVI, AQ/SAVI, WT/WT or WT/HAQ splenocytes treated with diABZI (100ng/ml) or DMXAA (20µg/ml) for 24hrs. Cell death was determined by PI staining. Data are representative of three independent experiments. Graphs represent the mean with error bars indication s.e.m. p values are determined by one-way ANOVA Tukey’s multiple comparison test. * p<0.05. n.s: not significant.

      References.

      (1)             Patel, S. et al. The Common R71H-G230A-R293Q Human TMEM173 Is a Null Allele. J Immunol 198, 776-787 (2017). 

      (2)             Sebastian, M. et al. Obesity and STING1 genotype associate with 23-valent pneumococcal vaccination efficacy. JCI Insight 5 (2020). 

      (3)             Mansouri, S. et al. MPYS Modulates Fatty Acid Metabolism and Immune Tolerance at Homeostasis Independent of Type I IFNs. J Immunol 209, 2114-2132 (2022). 

      (4)             Sivick, K. E. et al. Comment on "The Common R71H-G230A-R293Q Human TMEM173 Is a Null Allele". J Immunol 198, 4183-4185 (2017). 

      (5)             Gulen, M. F. et al. Signalling strength determines proapoptotic functions of STING. Nat Commun 8, 427 (2017). 

      (6)             Kabelitz, D. et al. Signal strength of STING activation determines cytokine plasticity and cell death in human monocytes. Sci Rep 12, 17827 (2022). 

      (7)             Murthy, A. M. V., Robinson, N. & Kumar, S. Crosstalk between cGAS-STING signaling and cell death. Cell Death Differ 27, 2989-3003 (2020). 

      (8)             Kuhl, N. et al. STING agonism turns human T cells into interferon-producing cells but impedes their functionality. EMBO Rep 24, e55536 (2023). 

      (9)             Li, C., Liu, J., Hou, W., Kang, R. & Tang, D. STING1 Promotes Ferroptosis Through MFN1/2-Dependent Mitochondrial Fusion. Front Cell Dev Biol 9, 698679 (2021). 

      (10)         Song, C. et al. SHR1032, a novel STING agonist, stimulates anti-tumor immunity and directly induces AML apoptosis. Sci Rep 12, 8579 (2022). 

      (11)         Liu, Y. et al. Activated STING in a vascular and pulmonary syndrome. N Engl J Med 371, 507-518 (2014). 

      (12)         Jin, L. et al. Identification and characterization of a loss-of-function human MPYS variant. Genes Immun 12, 263-269 (2011). 

      (13)         Yi, G. et al. Single nucleotide polymorphisms of human STING can affect innate immune response to cyclic dinucleotides. PLoS One 8, e77846 (2013). 

      (14)         Patel, S. et al. Response to Comment on "The Common R71H-G230A-R293Q Human TMEM173 Is a Null Allele". J Immunol 198, 4185-4188 (2017). 

      (15)         Gao, K. M. et al. Endothelial cell expression of a STING gain-of-function mutation initiates pulmonary lymphocytic infiltration. Cell Rep 43, 114114 (2024). 

      (16)         Gao, K. M., Motwani, M., Tedder, T., Marshak-Rothstein, A. & Fitzgerald, K. A. Radioresistant cells initiate lymphocyte-dependent lung inflammation and IFNgammadependent mortality in STING gain-of-function mice. Proc Natl Acad Sci U S A 119, e2202327119 (2022). 

      (17)         Monroe, K. M. et al. IFI16 DNA sensor is required for death of lymphoid CD4 T cells abortively infected with HIV. Science 343, 428-432 (2014). 

      (18)         Doitsh, G. et al. Cell death by pyroptosis drives CD4 T-cell depletion in HIV-1 infection. Nature 505, 509-514 (2014). 

      (19)         Jakobsen, M. R., Olagnier, D. & Hiscott, J. Innate immune sensing of HIV-1 infection. Curr Opin HIV AIDS 10, 96-102 (2015). 

      (20)         Silvin, A. & Manel, N. Innate immune sensing of HIV infection. Curr Opin Immunol 32, 54-60 (2015). 

      (21)         Altfeld, M. & Gale, M., Jr. Innate immunity against HIV-1 infection. Nat Immunol 16, 554-562 (2015). 

      (22)         Krapp, C., Jonsson, K. & Jakobsen, M. R. STING dependent sensing - Does HIV actually care? Cytokine Growth Factor Rev 40, 68-76 (2018). 

      (23)         Luksch, H. et al. STING-associated lung disease in mice relies on T cells but not type I interferon. J Allergy Clin Immunol 144, 254-266 e258 (2019). 

      (24)         Stinson, W. A. et al. The IFN-gamma receptor promotes immune dysregulation and disease in STING gain-of-function mice. JCI Insight 7 (2022). 

      (25)         Warner, J. D. et al. STING-associated vasculopathy develops independently of IRF3 in mice. J Exp Med 214, 3279-3292 (2017). 

      (26)         Fremond, M. L. et al. Overview of STING-Associated Vasculopathy with Onset in Infancy (SAVI) Among 21 Patients. J Allergy Clin Immunol Pract 9, 803-818 e811 (2021).

    1. Author Response:

      Reviewer #1 (Public Review):

      Force sensing and gating mechanisms of the mechanically activated ion channels is an area of broad interest in the field of mechanotransduction. These channels perform important biological functions by converting mechanical force into electrical signals. To understand their underlying physiological processes, it is important to determine gating mechanisms, especially those mediated by lipids. The authors in this manuscript describe a mechanism for mechanically induced activation of TREK-1 (TWIK-related K+ channel. They propose that force induced disruption of ganglioside (GM1) and cholesterol causes relocation of TREK-1 associated with phospholipase D2 (PLD2) to 4,5-bisphosphate (PIP2) clusters, where PLD2 catalytic activity produces phosphatidic acid that can activate the channel. To test their hypothesis, they use dSTORM to measure TREK-1 and PLD2 colocalization with either GM1 or PIP2. They find that shear stress decreases TREK-1/PLD2 colocalization with GM1 and relocates to cluster with PIP2. These movements are affected by TREK-1 C-terminal or PLD2 mutations suggesting that the interaction is important for channel re-location. The authors then draw a correlation to cholesterol suggesting that TREK-1 movement is cholesterol dependent. It is important to note that this is not the only method of channel activation and that one not involving PLD2 also exists. Overall, the authors conclude that force is sensed by ordered lipids and PLD2 associates with TREK-1 to selectively gate the channel. Although the proposed mechanism is solid, some concerns remain.

      1) Most conclusions in the paper heavily depend on the dSTORM data. But the images provided lack resolution. This makes it difficult for the readers to assess the representative images.

      The images were provided are at 300 dpi. Perhaps the reviewer is referring to contrast in Figure 2? We are happy to increase the contrast or resolution.

      As a side note, we feel the main conclusion of the paper, mechanical activation of TREK-1 through PLD2, depended primarily on the electrophysiology in Figure 1b-c, not the dSTORM. But both complement each other.

      2) The experiments in Figure 6 are a bit puzzling. The entire premise of the paper is to establish gating mechanism of TREK-1 mediated by PLD2; however, the motivation behind using flies, which do not express TREK-1 is puzzling.

      The fly experiment shows that PLD mechanosensitivity is more evolutionarily conserved than TREK-1 mechanosensitivity. We should have made this clearer.

      -Figure 6B, the image is too blown out and looks over saturated. Unclear whether the resolution in subcellular localization is obvious or not.

      Figure 6B is a confocal image, it is not dSTORM. There is no dSTORM in Figure 6. This should have been made clear in the figure legend. For reference, only a few cells would fit in the field of view with dSTORM.

      -Figure 6C-D, the differences in activity threshold is 1 or less than 1g. Is this physiologically relevant? How does this compare to other conditions in flies that can affect mechanosensitivity, for example?

      Yes, 1g is physiologically relevant. It is almost the force needed to wake a fly from sleep (1.2-3.2g). See ref 33. Murphy Nature Pro. 2017.

      3) 70mOsm is a high degree of osmotic stress. How confident are the authors that a. cell health is maintained under this condition and b. this does indeed induce membrane stretch? For example, does this stimulation activate TREK-1?

      Yes, osmotic swell activates TREK1. This was shown in ref 19 (Patel et al 1998). We agree the 70 mOsm is a high degree of stress. This needs to be stated better in the paper.

      Reviewer #2 (Public Review):

      This manuscript by Petersen and colleagues investigates the mechanistic underpinnings of activation of the ion channel TREK-1 by mechanical inputs (fluid shear or membrane stretch) applied to cells. Using a combination of super-resolution microscopy, pair correlation analysis and electrophysiology, the authors show that the application of shear to a cell can lead to changes in the distribution of TREK-1 and the enzyme PhospholipaseD2 (PLD2), relative to lipid domains defined by either GM1 or PIP2. The activation of TREK-1 by mechanical stimuli was shown to be sensitized by the presence of PLD2, but not a catalytically dead xPLD2 mutant. In addition, the activity of PLD2 is increased when the molecule is more associated with PIP2, rather than GM1 defined lipid domains. The presented data do not exclude direct mechanical activation of TREK-1, rather suggest a modulation of TREK-1 activity, increasing sensitivity to mechanical inputs, through an inherent mechanosensitivity of PLD2 activity. The authors additionally claim that PLD2 can regulate transduction thresholds in vivo using Drosophila melanogaster behavioural assays. However, this section of the manuscript overstates the experimental findings, given that it is unclear how the disruption of PLD2 is leading to behavioural changes, given the lack of a TREK-1 homologue in this organism and the lack of supporting data on molecular function in the relevant cells.

      We agree, the downstream effectors of PLD2 mechanosensitivity are not known in the fly. Other anionic lipids have been shown to mediate pain see ref 46 and 47. We do not wish to make any claim beyond PLD2 being an in vivo contributor to a fly’s response to mechanical force.

      That said we do believe we have established a molecular function at the cellular level. We showed PLD is robustly mechanically activated in a cultured fly cell line (BG2-c2) Figure 6a of the manuscript. And our previous publication established mechanosensation of PLD (Petersen et. al. Nature Com 2016) through mechanical disruption of the lipids. At a minimum, the experiments show PLDs mechanosensitivity is evolutionarily better conserved across species than TREK1.

      This work will be of interest to the growing community of scientists investigating the myriad mechanisms that can tune mechanical sensitivity of cells, providing valuable insight into the role of functional PLD2 in sensitizing TREK-1 activation in response to mechanical inputs, in some cellular systems.

      The authors convincingly demonstrate that, post application of shear, an alteration in the distribution of TREK-1 and mPLD2 (in HEK293T cells) from being correlated with GM1 defined domains (no shear) to increased correlation with PIP2 defined membrane domains (post shear). These data were generated using super-resolution microscopy to visualise, at sub diffraction resolution, the localisation of labelled protein, compared to labelled lipids. The use of super-resolution imaging enabled the authors to visualise changes in cluster association that would not have been achievable with diffraction limited microscopy. However, the conclusion that this change in association reflects TREK-1 leaving one cluster and moving to another overinterprets these data, as the data were generated from static measurements of fixed cells, rather than dynamic measurements capturing molecular movements.

      When assessing molecular distribution of endogenous TREK-1 and PLD2, these molecules are described as "well correlated: in C2C12 cells" however it is challenging to assess what "well correlated" means, precisely in this context. This limitation is compounded by the conclusion that TREK-1 displayed little pair correlation with GM1 and the authors describe a "small amount of TREK-1 trafficked to PIP2". As such, these data may suggest that the findings outlined for HEK293T cells may be influenced by artefacts arising from overexpression.

      The changes in TREK-1 sensitivity to mechanical activation could also reflect changes in the amount of TREK-1 in the plasma membrane. The authors suggest that the presence of a leak currently accounts for the presence of TREK-1 in the plasma membrane, however they do not account for whether there are significant changes in the membrane localisation of the channel in the presence of mPLD2 versus xPLD2. The supplementary data provide some images of fluorescently labelled TREK-1 in cells, and the authors state that truncating the c-terminus has no effect on expression at the plasma membrane, however these data provide inadequate support for this conclusion. In addition, the data reporting the P50 should be noted with caution, given the lack of saturation of the current in response to the stimulus range.

      We thank the reviewer for his/her concern about expression levels. We did test TREK-1 expression. mPLD decreases TREK-1 expression ~two-fold (see Author response image 1). We did not include the mPLD data since TREK-1 was mechanically activated with mPLD. For expression to account for the loss of TREK-1 stretch current (Figure 1b), xPLD would need to block surface expression of TREK-1. The opposite was true, xPLD2 increased TREK-1 expression increased (see Figure S2c). Furthermore, we tested the leak current of TREK-1 at 0 mV and 0 mmHg of stretch. Basal leak current was no different with xPLD2 compared to endogenous PLD (Figure 1d; red vs grey bars respectively) suggesting TREK-1 is in the membrane and active when xPLD2 is present. If anything, the magnitude of the effect with xPLD would be larger if the expression levels were equal.

      Author response image 1.

      TREK expression at the plasma membrane. TREK-1 Fluorescence was measured by GFP at points along the plasma membrane. Over expression of mouse PLD2 (mPLD) decrease the amount of full-length TREK-1 (FL TREK) on the surface more than 2-fold compared to endogenously expressed PLD (enPLD) or truncated TREK (TREKtrunc) which is missing the PLD binding site in the C-terminus. Over expression of mPLD had no effect on TREKtrunc.

      Finally, by manipulating PLD2 in D. melanogaster, the authors show changes in behaviour when larvae are exposed to either mechanical or electrical inputs. The depletion of PLD2 is concluded to lead to a reduction in activation thresholds and to suggest an in vivo role for PA lipid signaling in setting thresholds for both mechanosensitivity and pain. However, while the data provided demonstrate convincing changes in behaviour and these changes could be explained by changes in transduction thresholds, these data only provide weak support for this specific conclusion. As the authors note, there is no TREK-1 in D. melanogaster, as such the reported findings could be accounted for by other explanations, not least including potential alterations in the activation threshold of Nav channels required for action potential generation. To conclude that the outcomes were in fact mediated by changes in mechanotransduction, the authors would need to demonstrate changes in receptor potential generation, rather than deriving conclusions from changes in behaviour that could arise from alterations in resting membrane potential, receptor potential generation or the activity of the voltage gated channels required for action potential generation.

      We are willing to restrict the conclusion about the fly behavior as the reviewers see fit. We have shown PLD is mechanosensitivity in a fly cell line, and when we knock out PLD from a fly, the animal exhibits a mechanosensation phenotype.

      This work provides further evidence of the astounding flexibility of mechanical sensing in cells. By outlining how mechanical activation of TREK-1 can be sensitised by mechanical regulation of PLD2 activity, the authors highlight a mechanism by which TREK-1 sensitivity could be regulated under distinct physiological conditions.

      Reviewer #3 (Public Review):

      The manuscript "Mechanical activation of TWIK-related potassium channel by nanoscopic movement and second messenger signaling" presents a new mechanism for the activation of TREK-1 channel. The mechanism suggests that TREK1 is activated by phosphatidic acids that are produced via a mechanosensitive motion of PLD2 to PIP2-enriched domains. Overall, I found the topic interesting, but several typos and unclarities reduced the readability of the manuscript. Additionally, I have several major concerns on the interpretation of the results. Therefore, the proposed mechanism is not fully supported by the presented data. Lastly, the mechanism is based on several previous studies from the Hansen lab, however, the novelty of the current manuscript is not clearly stated. For example, in the 2nd result section, the authors stated, "fluid shear causes PLD2 to move from cholesterol dependent GM1 clusters to PIP2 clusters and this activated the enzyme". However, this is also presented as a new finding in section 3 "Mechanism of PLD2 activation by shear."

      For PLD2 dependent TREK-1 activation. Overall, I found the results compelling. However, two key results are missing. 1. Does HEK cells have endogenous PLD2? If so, it's hard to claim that the authors can measure PLD2-independent TREK1 activation.

      Yes, there is endogenous PLD (enPLD). We calculated the relative expression of xPLD2 vs enPLD. xPLD2 is >10x more abundant (Fig. S3d of Pavel et al PNAS 2020, ref 14 of the current manuscript). Hence, as with anesthetic sensitivity, we expect the xPLD to out compete the endogenous PLD, which is what we see. This should have been described more carefully in this paper and the studies pointed out that establish this conclusion.

      1. Does the plasma membrane trafficking of TREK1 remain the same under different conditions (PLD2 overexpression, truncation)? From Figure S2, the truncated TREK1 seem to have very poor trafficking. The change of trafficking could significantly contribute to the interpretation of the data in Figure 1.

      If the PLD2 binding site is removed (TREK-1trunc), yes, the trafficking to the plasma membrane is unaffected by the expression of xPLD and mPLD (Figure R1 above). For full length TREK1 (FL-TREK-1), co-expression of mPLD decreases TREK expression (Figure R1) and co-expression with xPLD increases TREK expression (Figure S2). This is exactly opposite of what one would expect if surface expression accounted for the change in pressure currents. Hence, we conclude surface expression does not account for loss of TREK-1 mechanosensitivity with xPLD2.

      For shear-induced movement of TREK1 between nanodomains. The section is convincing, however I'm not an expert on super-resolution imaging. Also, it would be helpful to clarify whether the shear stress was maintained during fixation. If not, what is the time gap between reduced shear and the fixed state. lastly, it's unclear why shear flow changes the level of TREK1 and PIP2.

      Shear was maintained during the fixing. We do not know why shear changes PIP2 and TREK-1 levels. Presumably endocytosis and or release of other lipid modifying enzymes affect the system. The change in TREK-1 levels appears to be directly through an interaction with PLD as TREKtrunc is not affected by over expression of xPLD or mPLD.

      For the mechanism of PLD2 activation by shear. I found this section not convincing. Therefore, the question of how does PLD2 sense mechanical force on the membrane is not fully addressed. Particularly, it's hard to imagine an acute 25% decrease cholesterol level by shear - where did the cholesterol go? Details on the measurements of free cholesterol level is unclear and additional/alternative experiments are needed to prove the reduction in cholesterol by shear.

      The question “how does PLD2 sense mechanical force on the membrane” we addressed and published in Nature Comm. In 2016. The title of that paper is “Kinetic disruption of lipid rafts is a mechanosensor for phospholipase D” see ref 13 Petersen et. al. PLD is a soluble protein associated to the membrane through palmitoylation. There is no transmembrane domain, which narrows the possible mechanism of its mechanosensation to disruption.

      The Nature Comm. reviewer identified as “an expert in PLD signaling” wrote the following of our data and the proposed mechanism:

      "This is a provocative report that identifies several unique properties of phospholipase D2 (PLD2). It explains in a novel way some long established observations including that the enzyme is largely regulated by substrate presentation which fits nicely with the authors model of segregation of the two lipid raft domains (cholesterol ordered vs PIP2 containing). Although PLD has previously been reported to be involved in mechanosensory transduction processes (as cited by the authors) this is the first such report associating the enzyme with this type of signaling... It presents a novel model that is internally consistent with previous literature as well as the data shown in this manuscript. It suggests a new role for PLD2 as a force transduction tied to the physical structure of lipid rafts and uses parallel methods of disruption to test the predictions of their model."

      Regarding cholesterol. We use a fluorescent cholesterol oxidase assay which we described in the methods. This is an appropriate assay for determining cholesterol levels in a cell which we use routinely. We have published in multiple journals using this method, see references 28, 30, 31. Working out the metabolic fate of cholesterol after sheer is indeed interesting but well beyond the scope of this paper. Furthermore, we indirectly confirmed our finding using dSTORM cluster analysis (Figure 3d-e). The cluster analysis shows a decrease in GM1 cluster size consistent with our previous experiments where we chemically depleted cholesterol and saw a similar decrease in cluster size (see ref 13). All the data are internally consistent, and the cholesterol assay is properly done. We see no reason to reject the data.

      Importantly, there is no direct evidence for "shear thinning" of the membrane and the authors should avoid claiming shear thinning in the abstract and summary of the manuscript.

      We previously established a kinetic model for PLD2 activation see ref 13 (Petersen et al Nature Comm 2016). In that publication we discussed both entropy and heat as mechanisms of disruption. Here we controlled for heat which narrowed that model to entropy (i.e., shear thinning) (see Figure 3c). We provide an overall justification below. But this is a small refinement of our previous paper, and we prefer not to complicate the current paper. We believe the proper rheological term is shear thinning. The following justification, which is largely adapted from ref 13, could be added to the supplement if the reviewer wishes.

      Justification: To establish shear thinning in a biological membrane, we initially used a soluble enzyme that has no transmembrane domain, phospholipase D2 (PLD2). PLD2 is a soluble enzyme and associated with the membrane by palmitate, a saturated 16 carbon lipid attached to the enzyme. In the absence of a transmembrane domain, mechanisms of mechanosensation involving hydrophobic mismatch, tension, midplane bending, and curvature can largely be excluded. Rather the mechanism appears to be a change in fluidity (i.e., kinetic in nature). GM1 domains are ordered, and the palmate forms van der Waals bonds with the GM1 lipids. The bonds must be broken for PLD to no longer associate with GM1 lipids. We established this in our 2016 paper, ref 13. In that paper we called it a kinetic effect, however we did not experimentally distinguish enthalpy (heat) vs. entropy (order). Heat is Newtonian and entropy (i.e., shear thinning) is non-Newtonian. In the current study we paid closer attention to the heat and ruled it out (see Figure 3c and methods). We could propose a mechanism based on kinetic disruption, but we know the disruption is not due to melting of the lipids (enthalpy), which leaves shear thinning (entropy) as the plausible mechanism.

      The authors should also be aware that hypotonic shock is a very dirty assay for stretching the cell membrane. Often, there is only a transient increase in membrane tension, accompanied by many biochemical changes in the cells (including acidification, changes of concentration etc). Therefore, I would not consider this as definitive proof that PLD2 can be activated by stretching membrane.

      Comment noted. We trust the reviewer is correct. In 1998 osmotic shock was used to activate the channel. We only intended to show that the system is consistent with previous electrophysiologic experiments.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      For many years, there has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, using state-of-the-art imaging techniques, they examined spike synchrony of hippocampal cells during locomotion and immobility states. In contrast to conventional understanding of the hippocampus, the authors demonstrated that hippocampal place cells exhibit prominent synchronous spikes locked to theta oscillations.

      Strengths:

      The voltage imaging used in this study is a highly novel method that allows recording not only suprathreshold-level spikes but also subthreshold-level activity. With its high frame rate, it offers time resolution comparable to electrophysiological recordings. Moreover, it enables the visualization of actual cell locations, allowing for the examination of spatial properties (e.g., Figure 4G).

      We thank the reviewer for pointing out the technical novelty of this work.

      Weaknesses:

      There is a notable deviation from several observations obtained through conventional electrophysiological recordings. Particularly, as mentioned below in detail, the considerable differences in baseline firing rates and no observations of ripple-triggered firing patterns raise some concerns about potential artifacts from imaging and analysis, such as cell toxicity, abnormal excitability, and false detection of spikes. While these findings are intriguing if the validity of these methods is properly proven, accepting the current results as new insights is challenging.

      We appreciate the reviewer’s insightful comments regarding the intriguing aspect of our findings. Indeed, the emergence of a novel form of CA1 population synchrony presents exciting implications for hippocampal memory research and beyond.

      While we acknowledge the deviations from conventional electrophysiological recordings, we respectfully contend that these differences do not necessarily imply methodological flaws. All experiments and analyses were conducted with meticulous adherence to established standards in the field.

      Regarding the observed variations in averaging firing rates, it is important to note the well-documented heterogeneity in CA1 pyramidal neuron firing rates, spanning from 0.01 to 10 Hz, with a skewed distribution toward lower frequencies (Mizuseki et al., 2013). Our exclusion criteria for neurons with low estimated firing rates may have inadvertently biased the selection towards more active neurons. Moreover, prior research has indicated that averaging firing rates tend to increase during exposure to novel environments (Karlsson et al., 2008), and among deep-layer CA1 pyramidal neurons (Mizuseki et al., 2011). Given our recording setup in a highly novel environment and the predominance of deep CA1 pyramidal neurons in our sample, the observed higher averaging firing rates could be influenced by these factors. Considering these points, our mean firing rates (3.2 Hz) are reasonable estimations compared to previously reported values obtained from electrophysiological recordings (2.1 Hz in McHugh et al., 1996 and 2.4-2.6 Hz in Buzsaki et al., 2003).

      Regarding concerns about potential cell toxicity, previous studies have shown that Voltron expression and illumination do not significantly alter membrane resistance, membrane capacitance, resting membrane potentials, spike amplitudes, and spike width (see Abdelfattah 2019, Science, Supplementary Figure 11 and 12). In our recordings, imaged neurons exhibit preserved membrane and dendritic morphology during and after experiments (Author response image 1), supporting the absence of significant toxicity.

      Author response image 1.

      Voltron-expressing neurons exhibit preserved membrane and dendritic morphology. (A) Images of two-photon z-stack maximum intensity projection showing Voltron-expressing neurons taken after voltage image experiments in vivo. (B) Post-hoc histological images of neurons being voltage-imaged.

      Regarding spike detection, we use validated algorithms (Abdelfattah et al., 2019 and 2023) to ensure robust and reliable detection of spikes. Spiking activity was first separated from slower subthreshold potentials using high-pass filtering. This way, a slow fluorescence increase will not be detected as a spike, even if its amplitude is large. We benchmarked the detection algorithm in computer simulation. The sensitivity and specificity of the algorithm exceed 98% at the level of signal-to-noise ratio of our recordings. While we acknowledge that a small number of spikes, particularly those occurring later in a burst, might be missed due to their smaller amplitudes (as illustrated in Figure 1 and 2 of the manuscript), we anticipate that any missed spikes would lead to a decrease rather than an increase in synchrony between neurons. Overall, we are confident that spike detection is performed in a rigorous and robust manner.

      To further strengthen these points, we will include the following in the revision:

      (1) Histological images of recorded neurons during and after experiments.

      (2) Further details regarding the validation of spike detection algorithms.

      (3) Analysis of publicly available electrophysiological datasets.

      (4) Discussion regarding the reasons behind the novelty of some of our findings compared to previous observations.

      In conclusion, we assert that our experimental and analysis approach upholds rigorous standards. We remain committed to reconciling our findings with previous observations and welcome further scrutiny and engagement from the scientific community to explore the intriguing implications of our findings.

      Reviewer #2 (Public Review):

      Summary:

      This study employed voltage imaging in the CA1 region of the mouse hippocampus during the exploration of a novel environment. The authors report synchronous activity, involving almost half of the imaged neurons, occurred during periods of immobility. These events did not correlate with SWRs, but instead, occurred during theta oscillations and were phased-locked to the trough of theta. Moreover, pairs of neurons with high synchronization tended to display non-overlapping place fields, leading the authors to suggest these events may play a role in binding a distributed representation of the context.

      We thank the reviewer for a thorough and thoughtful review of our paper.

      Strengths:

      Technically this is an impressive study, using an emerging approach that allows single-cell resolution voltage imaging in animals, that while head-fixed, can move through a real environment. The paper is written clearly and suggests novel observations about population-level activity in CA1.

      We thank the reviewer for pointing out the technical strength and the novelty of our observations.

      Weaknesses:

      The evidence provided is weak, with the authors making surprising population-level claims based on a very sparse data set (5 data sets, each with less than 20 neurons simultaneously recorded) acquired with exciting, but less tested technology. Further, while the authors link these observations to the novelty of the context, both in the title and text, they do not include data from subsequent visits to support this. Detailed comments are below:

      We understand the reviewer’s concerns regarding the size of the dataset. Despite this limitation, it is important to note that synchronous ensembles beyond what could be expected from chance (jittering) were detected in all examined data. In the revision, we plan to add more data, including data from subsequent visits, to further strengthen our findings.

      (1) My first question for the authors, which is not addressed in the discussion, is why these events have not been observed in the countless extracellular recording experiments conducted in rodent CA1 during the exploration of novel environments. Those data sets often have 10x the neurons simultaneously recording compared to these present data, thus the highly synchronous firing should be very hard to miss. Ideally, the authors could confirm their claims via the analysis of publicly available electrophysiology data sets. Further, the claim of high extra-SWR synchrony is complicated by the observation that their recorded neurons fail to spike during the limited number of SWRs recorded during behavior- again, not agreeing with much of the previous electrophysiological recordings.

      We understand the reviewer’s concern. We will examine publicly available electrophysiology datasets to gain further insights into any similarities and differences to our findings. Based on these results, we will discuss why these events have not been previously observed/reported.

      (2) The authors posit that these events are linked to the novelty of the context, both in the text, as well as in the title and abstract. However, they do not include any imaging data from subsequent days to demonstrate the failure to see this synchrony in a familiar environment. If these data are available it would strengthen the proposed link to novelty if they were included.

      We thank the reviewer’s constructive suggestion. We will acquire more datasets from subsequent visits to gain further insights into these synchronous events.

      3) In the discussion the authors begin by speculating the theta present during these synchronous events may be slower type II or attentional theta. This can be supported by demonstrating a frequency shift in the theta recording during these events/immobility versus the theta recording during movement.

      We thank the reviewer’s constructive suggestion. We did demonstrate a frequency shift to a lower frequency in the synchrony-associated theta during immobility than during locomotion (see Fig. 4B, the red vs. blue curves). We will enlarge this panel and specifically refer to it in the corresponding discussion paragraph.

      (4) The authors mention in the discussion that they image deep-layer PCs in CA1, however, this is not mentioned in the text or methods. They should include data, such as imaging of a slice of a brain post-recording with immunohistochemistry for a layer-specific gene to support this.

      We thank the reviewer’s constructive suggestion. We do have images of brain slices post-recordings (Author response image 2). Imaged neurons are clearly located in the deep CA1 pyramidal layer. We will add these images and quantification in the revised manuscript.

      Author response image 2.

      Imaged neurons are located in the deep pyramidal layer of the dorsal hippocampal CA1 region.

      Reviewer #3 (Public Review):

      Summary:

      In the present manuscript, the authors use a few minutes of voltage imaging of CA1 pyramidal cells in head-fixed mice running on a track while local field potentials (LFPs) are recorded. The authors suggest that synchronous ensembles of neurons are differentially associated with different types of LFP patterns, theta and ripples. The experiments are flawed in that the LFP is not "local" but rather collected in the other side of the brain, and the investigation is flawed due to multiple problems with the point process analyses. The synchrony terminology refers to dozens of milliseconds as opposed to the millisecond timescale referred to in prior work, and the interpretations do not take into account theta phase locking as a simple alternative explanation.

      We genuinely appreciate the reviewer’s feedback and acknowledge the concerns raised. However, we believe these concerns can be effectively addressed without undermining the validity of our conclusions. With this in mind, we respectfully disagree with the assessment that our experiments and investigation are flawed. Please allow us to address these concerns and offer additional context to support the validity of our study.

      Weaknesses:

      The two main messages of the manuscript indicated in the title are not supported by the data. The title gives two messages that relate to CA1 pyramidal neurons in behaving head-fixed mice: (1) synchronous ensembles are associated with theta (2) synchronous ensembles are not associated with ripples.

      There are two main methodological problems with the work:

      (1) Experimentally, the theta and ripple signals were recorded using electrophysiology from the opposite hemisphere to the one in which the spiking was monitored. However, both signals exhibit profound differences as a function of location: theta phase changes with the precise location along the proximo-distal and dorso-ventral axes, and importantly, even reverses with depth. And ripples are often a local phenomenon - independent ripples occur within a fraction of a millimeter within the same hemisphere, let alone different hemispheres. Ripples are very sensitive to the precise depth - 100 micrometers up or down, and only a positive deflection/sharp wave is evident.

      We appreciate the reviewer’s consideration regarding the collection of LFP from the contralateral hemisphere. While we acknowledge the limitation of this design, we believe that our findings still offer valuable insights into the dynamics of synchronous ensembles. Despite potential variations in theta phases with recording locations and depth, we find that the occurrence and amplitudes of theta oscillations are generally coordinated across hemispheres (Buzsaki et al., Neurosci., 2003). Therefore, the presence of prominent contralateral LFP theta around the times of synchronous ensembles in our study (see Figure 4A of the manuscript) strongly supports our conclusion regarding their association with theta oscillations, despite the collection of LFP from the opposite hemisphere.

      In addition, in our manuscript, we specifically mentioned that the “preferred phases” varied from session to session, likely due to the variability of recording locations (see Line 254-256). Therefore, we think that the reviewer’s concern regarding theta phase variability has already been addressed in the present manuscript.

      Regarding ripple oscillations, while we recognize that they can sometimes occur locally, the majority of ripples occur synchronously in both hemispheres (up to 70%, see Szabo et al., Neuron, 2022; Buzsaki et al., Neurosci., 2003). Therefore, using contralateral LFP to infer ripple occurrence on the ipsilateral side has been a common practice in the field, employed by many studies published in respectable journals (Szabo et al., Neuron, 2022; Terada et al., Nature, 2021; Dudok et al., Neuron, 2021; Geiller et al., Neuron, 2020). Furthermore, our observation that 446 synchronous ensembles during immobility do not co-occur with contralateral ripples, and the remaining 313 ensembles during locomotion are not associated with ripples, as ripples rarely occur during locomotion. Therefore, our conclusion that synchronous ensembles are not associated with ripple oscillations is supported by data.

      (2) The analysis of the point process data (spike trains) is entirely flawed. There are many technical issues: complex spikes ("bursts") are not accounted for; differences in spike counts between the various conditions ("locomotion" and "immobility") are not accounted for; the pooling of multiple CCGs assumes independence, whereas even conditional independence cannot be assumed; etc.

      We acknowledge the reviewer’s concern regarding spike train analysis. Indeed, complex bursts or different behavioral conditions can lead to differences in spike counts that could potentially affect the detection of synchronous ensembles. However, our jittering procedure (see Line 121-132) is designed to control for the variation of spike counts. Importantly, while the jittered spike trains also contain the same spike count variations, we found 7.8-fold more synchronous events in our data compared to jitter controls (see Figure 1G of the manuscript), indicating that these factors cannot account for the observed synchrony.

      To explicitly demonstrate that complex bursts cannot account for the observed synchrony, we have performed additional analysis to remove all latter spikes in bursts and only count the single and the first spikes of bursts. Importantly, we found that this procedure did not change the rate and size of synchronous ensembles, nor did it significantly alter the grand-average CCG (see Author response image 3). The results of this analysis explicitly rule out a significant effect of complex spikes on the analysis of synchronous ensembles.

      Author response image 3.

      Population synchrony remains after the removal of spikes in bursts. (A) The grand-average cross correlogram (CCG) was calculated using spike trains without latter spikes in bursts. The gray line represents the mean grand average CCG between reference cells and randomly selected cells from different sessions. (B) Pairwise comparison of the event rates of population synchrony between spike trains containing all spikes and spike trains without latter spikes in bursts. Bar heights indicate group means (n=10 segments, p=0.036, Wilcoxon signed-rank test). (C) Histogram of the ensemble sizes as percentages of cells participating in the synchronous ensembles.

      Beyond those methodological issues, there are two main interpretational problems: (1) the "synchronous ensembles" may be completely consistent with phase locking to the intracellular theta (as even shown by the authors themselves in some of the supplementary figures).

      We agree with the reviewer that the synchronous ensembles are indeed consistent with theta phase locking. However, it is important to note that theta phase locking alone does not necessarily imply population synchrony. In fact, theta phase locking has been shown to “reduce” population synchrony in a previous study (Mizuseki et al., 2014, Phil. Trans. R. Soc. B.). Thus, the presence of theta phase locking cannot be taken as a simple alternative explanation of the synchronous ensembles.

      To directly assess the contribution of theta phase locking to synchronous ensembles, we have performed a new analysis to randomize the specific theta cycles in which neurons spike, while keeping the spike phases constant. This manipulation disrupts spike co-occurrence while preserving theta phase locking, allowing us to test whether theta phase locking alone can explain the population synchrony, or whether spike co-occurrence in specific cycles is required. The grand-average CCG shows a much smaller peak compared to the original peak (Author response image 4A). Moreover, synchronous event rates show a 4.5-fold decrease in the randomized data compared to the original event rates (Author response image 4B). Thus, the new analysis reveals theta phase locking alone cannot account for the population synchrony.

      Author response image 4.

      Drastic reduction of population synchrony by randomizing spikes to other theta cycles while preserving the phases. (A) The grand-average cross correlogram (CCG) was calculated using original spike trains (black) and randomized spike trains where theta phases of the spikes are kept the same but spike timings were randomly moved to other theta cycles (red). (B) Pairwise comparison of the event rates of population synchrony between the original spike trains and randomized spike trains (n=10 segments, p=0.002, Wilcoxon signed-rank test). Bar heights indicate group means. ** p<0.01

      (2) The definition of "synchrony" in the present work is very loose and refers to timescales of 20-30 ms. In previous literature that relates to synchrony of point processes, the timescales discussed are 1-2 ms, and longer timescales are referred to as the "baseline" which is actually removed (using smoothing, jittering, etc.).

      Regarding the timescale of synchronous ensembles, we acknowledge that it varies considerably across studies and cell types. However, it is important to note that a timescale of dozens, or even hundreds of milliseconds is common for synchrony terminology in CA1 pyramidal neurons (see Csicsvari et al., Neuron, 2000; Harris et al., Science, 2003; Malvache et al., Science, 2016; Yagi et al., Cell Reports, 2023). In fact, a timescale of 20-30 ms is considered particularly important for information transmission and storage in CA1, as it matches the membrane time constant of pyramidal neurons, the period of hippocampal gamma oscillations, and the time window for synaptic plasticity. Therefore, we believe that this timescale is relevant and in line with established practices in the field.

    1. Author response:

      eLife Assessment

      This useful study integrates experimental methods from materials science with psychophysical methods to investigate how frictional stabilities influence tactile surface discrimination. The authors argue that force fluctuations arising from transitions between frictional sliding conditions facilitate the discrimination of surfaces with similar friction coefficients. However, the reliance on friction data obtained from an artificial finger, together with the ambiguous correlative analyses relating these measurements to human psychophysics, renders the findings incomplete.

      Our main goal with this paper was to show that the most common metric, i.e. average friction coefficient—widely used in tactile perception and device design—is fundamentally unsound, and to offer a secondary parameter that is compatible with the fact that human motion is unconstrained, leading to dynamic interfacial mechanics. In contrast with the summary assessment, we also note that the average friction coefficients in our study were not particularly similar, ranging from differences of 0.4 – 1, a typical range seen in most studies. We believe some of the comments originate from a misinterpretation of our statistically significant, but negative correlation between human results and friction coefficients – which leads to the spurious conclusion that nearly identical objects should be very easy to tell apart, thus supporting our central argument for the need of an alternative. We understand the Reviewers wanting to see that we can demonstrate that humans using instabilities in situ. This is seemingly reasonable, but we explain the significant challenges and fundamental unknowns to those experiments. However, we modified our title to reflect our focus on offering an alternative to the average coefficient of friction.

      We do not think it was feasible, at this stage, to demonstrate that humans use friction instabilities through direct manipulation and observation in human participants. In short, there are still several fundamental unknowns: (1) a decision-making model would need to be created, but it is unknown if tactile decision making follows other models, (2) it is further unknown what constitutes “tactile evidence”, though at our manuscript’s conclusion, we propose that friction instabilities are better suited for to be tactile evidence than the averaging of friction coefficients from a narrow range of human exploration (3) in the design of samples, from a friction mechanics and materials perspective, it is not at this point, possible to pre-program surfaces a priori to deliver friction instabilities and instead must be experimentally determined – especially when attempting to achieve this in controlled surfaces that do not create other overriding tactile cues, like macroscopic bumps or large differences in surface roughness. (4) Given that the basis for tactile percepts, like which object feels “rougher” or “smoother” is not sufficiently established and we have seen leads to confusion, it is necessary to use a 3-alternative forced choice task which avoids asking objects along a preset perceptual dimension – a challenge recognized by Reviewer 3. However, this would bring in issues of memory in the decision-making model. (5) The prior points are compounded by the fact that, we believe, tactile exploration must be performed in an unconstrained manner, i.e., without an apparatus generating motion onto a stationary finger. Work by Liu et al. (IEEE ToH, 2024) showed that recreating friction obtained during free exploration onto a stationary finger was uninterpretable by the participants, hinting at the importance of efference copies(1). We believe that each of the above-mentioned issues constitutes a significant advance in knowledge and would require discussion and dissemination with the community. Finally, one of our overarching goals is to create a consistent method to characterize surfaces, and given individual variability in human fingers and motion, a machine-based method that can rapidly, consistently, and sufficiently replicate tactile exploration is needed.

      Finally, we also justify our use of a mock finger to provide a method to characterize surfaces in tactile studies that other researchers could reasonably recreate, without creating a standard around individual humans, considering the variability in finger shape and motion during exploration. We do not believe this is an “either-or” argument, but rather that standardized methods to characterize surfaces and devices are greatly needed in the field. From these standardized methods, like surface roughness, some tabulated values of friction coefficient, or surface energy, etc., the current metrics to parameterize results are largely incapable of capturing the dynamic changes in forces expected during human tactile exploration.

      Our changes to the manuscript (Page 1 & SI Page 1, Title)

      “Alternatives to Friction Coefficient: Role of Frictional Instabilities for Fine Touch Perception”

      Reviewer 1 (Public review):

      Summary:

      In this paper, Derkaloustian et. al look at the important topic of what affects fine touch perception. The observations that there may be some level of correlation with instabilities are intriguing. They attempted to characterize different materials by counting the frequency (occurrence #, not of vibration) of instabilities at various speeds and forces of a PDMS slab pulled lengthwise over the material. They then had humans make the same vertical motion to discriminate between these samples. They correlated the % correct in discrimination with differences in frequency of steady sliding over the design space as well as other traditional parameters such as friction coefficient and roughness. The authors pose an interesting hypothesis and make an interesting observation about the occurrences of instability regimes in different materials while in contact with PDMS, which is interesting for the community to see in the publication. It should be noted that the finger is complex, however, and there are many factors that may be quite oversimplified with the use of the PDMS finger, and the consideration and discounting of other parameters are not fully discussed in the main text or SI. Most importantly, however, the conclusions as stated do not align with the primary summary of the data in Figure 2.

      Strengths:

      The strength of this paper is in its intriguing hypothesis and important observation that instabilities may contribute to what humans are detecting as differences in these apparently similar samples.

      We thank Reviewer 1 for their time on the manuscript, recognizing the approach we took, and offering constructive feedback. We believe that our conclusions, in fact, are supported by the primary summary of the data in Figure 2 but we believe that our use of R<sup>2</sup> could have led to misinterpretation. The trend with friction coefficient and percent correct was indeed statistically significant but was spurious because the slope was negative. In the revision, we add clarifying comments throughout, change from R<sup>2</sup> to r as to highlight the negative trend, and adjust the figures to better focus on friction coefficient.

      Finally, we added a new section to discuss the tradeoffs between using a real human finger versus a mock finger, and which situations may warrant the use of one or the other. In short, for our goal of characterizing surfaces to be used in tactile experiments, we believe a mock finger is more sustainable and practical than using real humans because human fingers are unique per participant, humans move their fingers at constantly changing pressures and velocities, and friction generated during free exploring human cannot be satisfactorily replicated by moving a sample onto a stationary finger. But, we do not disagree that for other types of experiments, characterizing a human participant directly may be more advantageous.

      Weaknesses:

      Comment 1 - The most important weakness is that the findings do not support the statements of findings made in the abstract. Of specific note in this regard is the primary correlation in Figure 2B between SS (steady sliding) and percent correct discrimination. Of specific note in this regard is the primary correlation in Figure 2B between SS (steady sliding) and percent correct discrimination. While the statistical test shows significance (and is interesting!), the R-squared value is 0.38, while the R-squared value for the "Friction Coefficient vs. Percent Correct" plot has an R-squared of 0.6 and a p-value of < 0.01 (including Figure 2B). This suggests that the results do not support the claim in the abstract: "We found that participant accuracy in tactile discrimination was most strongly correlated with formations of steady sliding, and response times were negatively correlated with stiction spikes. Conversely, traditional metrics like surface roughness or average friction coefficient did not predict tactile discriminability."

      We disagree that the trend with friction coefficient suggests the results do not support the claim because the correlation was found to be negative. However, we could have made the comparison more apparent and expanded on this point, given its novelty.

      While the R<sup>2</sup> value corresponding to the “Friction Coefficient vs. Percent Correct” plot is notably higher, our results show that the slope is negative, which would be statistically spurious. This is because a negative correlation between percent correct (accuracy in discriminating surfaces) and difference in friction coefficient means that the more similar two surfaces are (by friction coefficient), the easier it would be for people to tell them apart. That is, it incorrectly concludes that two identical surfaces would be much easier to tell apart than two surfaces with greatly different friction coefficients.

      This is counterintuitive to nearly all existing results, but we believe our samples were well-positioned to uncover this trend by minimizing variability, by controlling multiple physical parameters in the samples, and that the friction coefficient — typically calculated in the field as an average friction coefficient — ignores all the dynamic changes in forces present in elastic systems undergoing mesoscale friction, i.e., human touch, as seen in Fig. 1 in a mock finger and Fig. 3 in a real finger. By demonstrating this statistically spurious trend, we believe this strongly supports our premise that an alternative to friction coefficient is needed in the design of tactile psychophysics and haptic interfaces.

      We believe that this could have been misinterpreted, so we took several steps to improve clarity, given the importance of this finding: we separated the panel on friction coefficient to its own panel, we changed from R<sup>2</sup> to r throughout, and we added clarifying text. We also added a small section focusing on this spurious trend.

      Our changes to the manuscript (Page 10)

      “To compare the value of looking at frictional instabilities, we also performed GLMM fits on common approaches in the field, like a friction coefficient or material property typically used in tactile discrimination, shown in Fig. 2D-E. Interestingly, in Fig. 2D, we observed a spurious, negative correlation between friction coefficient (typically and often problematically simplified as across all tested conditions) and accuracy (r = -0.64, p < 0.01); that is, the more different the surfaces are by friction coefficient, the less people can tell them apart. This spurious correlation would be the opposite of intuition, and further calls into question the common practice of using friction coefficients in touch-related studies. The alternative, two-term model which includes adhesive contact area for friction coefficient(29) was even less predictive (see Fig. S6A of SI). We believe such a correlation could not have been uncovered previously as our samples are minimal in their physical variations. Yet, the dynamic changes in force even within a single sample are not considered, despite being a key feature of mesoscale friction during human touch.

      We investigate different material properties in Fig. 2E. Differences in average roughness R<sub>a</sub> (or other parameters, like root mean square roughness R<sub>rms</sub> (Fig. S6A of SI) did not show a statistically significant correlation to accuracy. Though roughness is a popular parameter, correlating any roughness parameter to human performance here could be moot: the limit of detecting roughness differences has previously been defined as 13 nm on structured surfaces33 and much higher for randomly rough surfaces,(46) all of which are magnitudes larger than the roughness differences between our surfaces. The differences in contact angle hysteresis – as an approximation of the adhesion contributions(47) – do not present any statistically significant effects on performance.”

      Comment 2, Part 1

      Along the same lines, other parameters that were considered such as the "Percent Correct vs. Difference in Sp" and "Percent Correct vs. Difference in SFW" were not plotted for consideration in the SI. It would be helpful to compare these results with the other three metrics in order to fully understand the relationships.

      We have added these plots to the SI. We note that we had checked these relationships and discussed them briefly, but did not include the plot. The plots show that the type of instability was not as helpful as its presence or absence.

      Our changes to the manuscript (Page 9)

      “Furthermore, a model accounting for slow frictional waves alone specifically shows a significant, negative effect on performance (p < 0.01, Fig. S5 of SI), suggesting that in these samples and task, the type of instability was not as important.”

      Added (SI Page 4)

      “and no correlation between accuracy and stiction spikes (Fig. S5).”

      Comment 2, Part 2

      Other parameters such as stiction magnitude and differences in friction coefficient over the test space could also be important and interesting.

      We agree these are interesting and have thought about them. We are aware that others, like Gueorguiev et al., have studied stiction magnitudes, and though there was a correlation, the physical differences in surface roughness (glass versus PMMA) investigated made it unclear if these could be generalized further(2). We are unsure how to proceed here with a satisfactory analysis of stiction magnitude, given that stiction spikes are not always generated. In fact, Fig. 1 shows that for many velocities and pressures, they do not form. However, we offer some speculation on why stiction spikes may be overrepresented in the literature because:

      (1) They are prone to being created if the finger was loaded for a long time onto a surface prior to movement, thus creating adhesion by contact aging which is unlike active human exploration. We avoid this by discarding the first pull in our measurements, and is a standard practice in mechanical characterization if contact aging needs to be avoided.

      (2) The ranges of velocities and pressures explored were small.

      (3) In an effort to generate strong tactile stimuli, highly adhesive or rough surfaces are used.

      (4) They are visually distinctive on a plot, but we are unaware of any mechanistic reason that mechanoreceptors would be extremely sensitive to this low frequency event over other signals.

      In ongoing work, however, we are always cognizant that if stiction spikes are a dominant factor, then a secondary analysis on their magnitude would be important.

      We interpret “difference in friction coefficient over the test space” to be, for a single surface, like C4, to find the highest average friction for a condition of single velocity and mass and subtract that from the lowest average friction for a condition of single velocity and mass. We calculated the difference in friction coefficient in the typical manner of the field, by averaging all data collected at all velocities and masses and assigning a single value for all of a surface, like C4. We had performed this, and have the data, but we are wary of overinterpreting secondary and tertiary metrics because they do not have any fundamental basis in traditional tribology, and this value, if used by humans, would suggest that they rapidly explore a large parameter space to find a “maximum” and “minimum” friction. Furthermore, the range in friction across the test space, after averaging, may in fact, be smaller than the range of friction in a single measurement. For example, in Fig. 1B, the friction coefficient can be calculated by dividing the data by the normal force ([applied mass + 6 g finger] × gravity). The friction coefficient in a single run varies widely, as expected.

      Fig. 2D shows a GLMM fit between percent correct responses across our pairs and the differences in friction coefficient for each pair, where we see a spurious negative correlation. As we had the data of all average friction coefficients for each condition for a given material, we also looked at the difference in maximum and minimum friction coefficients. For our tested pairs, these differences also lined up on a statistically significant, negative GLMM fit (r = -0.86, p < 0.005). However, the values for a given surface can vary drastically, with an interquartile range of 1.20 to 2.09 on a single surface. We fit participant accuracy to the differences in these IQRs across pairs. This also led to a negative GLMM fit (r = -0.65, p < 0.05). However, we are hesitant to add this to the manuscript for the reasons stated previously.

      Comment 3, Part 1

      Beyond this fundamental concern, there is a weakness in the representativeness of the PDMS finger, the vertical motion, and the speed of sliding to real human exploration.

      Overall, this is a continuous debate that we think offers two solutions. There is always a tradeoff between using a synthetic model of a finger versus a real human finger, and there is a place for both models. That is, while our mock finger will be more successful the closer it is to a human finger, it is not our goal to fully replace a human finger, rather our goal is to provide a method of characterizing surfaces that is indeed relevant on the length scale of human touch.

      The usefulness of the mock finger is in isolating the features of each surface that is independent of human variability, i.e., instabilities that form without changing loading conditions between sliding motions or even within one sliding motion. Of course, with this method, we still require confirmation of these features still forming during human exploration, which we show in Fig. 3.

      We believe that this method of characterizing surfaces at the mesoscale will ultimately lead to more successful human studies on tactile perception. Currently, and as shown in the paper, characterizing surfaces through traditional techniques, such as a commercial tribometer (friction coefficient, using a steel or hard metal ball), roughness (via atomic force microscopy or some other metrology), surface energy are less predictive. Thus, we believe this mock finger is stronger than the current state-of-the-art characterizing surfaces (we are also aware of a commercial mock finger company, but we were unable to purchase or obtain an evaluation model).

      One of the main – and severe – limitations of using a human finger is that all fingers are different, meaning any study focusing on a particular user may not apply to others or be recreated easily by other researchers. We cannot set a standard for replication around a real human finger as that participant may no longer be available, or willing to travel the world as a “standard”. Furthermore, the method in which changes their pressures and velocities is different. We note that this is a challenge unique to touch perception – how an object is touched changes the friction generated, and thus the tactile stimulus generated, whereas a standardized stimulus is more straightforward for light or sound.

      However, we do emphasize that we have strongly considered the balance between feasibility and ecological validity in the design of a mock finger. We have a mock finger, with the three components of stiffness of a human finger (more below). Furthermore, we have also successfully used this mock finger in correlations with human psychophysics in previous work, where findings from our mechanical experiments were predictive of human performance(3-6).

      Our changes to the manuscript Added (Page 2-3)

      “Mock finger as a characterization tool

      In this work, we use a mechanical setup with a PDMS mock finger to derive tactile predictors from controlled friction traces alternative to average friction coefficients. While there is a tradeoff in selecting a synthetic finger over a more accurate, real human finger in modeling touch, our aim to design a method of mesoscale surface characterization for more successful studies on tactile perception cannot be fulfilled using one human participant as a standard. We believe that with sufficient replication of surface and bulk properties as well as contact geometry, and controlled friction measurements collected at loading conditions observed during a tactile discrimination task, we can isolate unique frictional features of a set of surfaces that do not arise from human-to-human variability.

      The major component of a human finger, by volume, is soft tissue (~56%)(22), resulting in an effective modulus close to 100 kPa(23,24). In order to achieve this same softness, we crosslink PDMS in a 1×1×5 cm mold at a 30:1 elastomer:crosslinker ratio. However, two more features impart increased stiffness in a human finger. Most of this added rigidity is derived from the bone at the fingertip, the distal phalanx(23–25), which we mimic with an acrylic bone within our PDMS network. The stratum corneum, the stiffer, glassier outer layer of skin(26), is replicated with the surface of the mock finger glassified, or further crosslinked, after 8 hours of UV-Ozone treatment(27). This treatment also modifies the surface properties of the native PDMS to align with those of a human finger more closely. It minimizes the viscoelastic tack at the surface, resulting in a comparable non-sticky surface. At least one day after treatment, the finger surface returns to moderate hydrophilicity (~60º), as is typically observed for a real finger(28).

      The initial contact area formed before a friction trace is collected is a rectangle of 1×1 cm. While this shape is not entirely representative of a human finger with curves and ridges, human fingers flatten out enough to reduce the effects of curvature with even very light pressures(28–30). This implies that regardless of finger pressure, the contact area is largely load-independent, which is more accurately replicated with a rectangular mock finger. It is still a challenge to control pressure distribution with this planar interface, but non-uniform pressures are also expected during human exploration.

      Lastly, we consider fingerprints vs. flat fingers. A key finding of our previous work is that while fingerprints enhanced frictional dynamics at certain conditions, key features were still maintained with a flat finger.7 Furthermore, for some loading conditions, the more amplified signals could also result in more similar friction traces for different surfaces. We have continued to use flat fingers in our mechanical experiments, and have observed good agreement between these friction traces and human experiments(7,8,21,31).”

      (Page 3-4, Materials and Methods)

      “Mock Finger Preparation

      Friction forces across all six surfaces were measured using a custom apparatus with a polydimethylsiloxane (PDMS, Dow Sylgard 184) mock finger that mimics a human finger’s

      mechanical properties and contact mechanics while exploring a surface relatively closely(7,8). PDMS and crosslinker were combined in a 30:1 ratio to achieve a stiffness of 100 kPa comparable to a real finger, then degassed in a vacuum desiccator for 30 minutes. We are aware that the manufacturer recommended crosslinking ratio for Sylgard 184 is 10:1 due to potential uncrosslinked liquid residues(32), but further crosslinking concentrated at the surface prevents this. The prepared PDMS was then poured into a 1×1×5 cm mold also containing an acrylic 3D-printed “bone” to attach applied masses on top of the “fingertip” area contacting a surface during friction testing. After crosslinking in the mold at 60ºC for 1 hour, the finger was treated with UV-Ozone for 8 hours out of the mold to minimize viscoelastic tack.

      Mechanical Testing

      A custom device using our PDMS mock finger was used to collect macroscopic friction force traces replicating human exploration(7,8). After placing a sample surface on a stage, the finger was lowered at a slight angle such that an initial 1×1 cm rectangle of “fingertip” contact area could be established. We considered a broad range of applied masses (M \= 0, 25, 75, and 100 g) added onto the deadweight of the finger (6 g) observed during a tactile discrimination task. The other side of the sensor was connected to a motorized stage (V-508 PIMag Precision Linear Stage, Physikinstrumente) to control both displacement (4 mm across all conditions) and sliding velocity (v \= 5, 10, 25, and 45 mm s<sup>-1</sup>). Forces were measured at all 16 combinations of mass and velocity via a 250 g Futek force sensor (k \= 13.9 kN m<sup>-1</sup>) threaded to the bone, and recorded at an average sampling rate of 550 Hz with a Keithley 7510 DMM digitized multimeter. Force traces were collected in sets of 4 slides, discarding the first due to contact aging. Because some mass-velocity combinations were near the boundaries of instability phase transitions, not all force traces at these given conditions exhibited similar profiles.

      Thus, three sets were collected on fresh spots for each condition to observe enough occurrences of multiple instabilities, at a total of nine traces per combination for each surface.”

      Added References (Page 13)

      M. Murai, H.-K. Lau, B. P. Pereira and R. W. H. Pho, J. Hand Surg., 1997, 22, 935–941.

      A. Abdouni, M. Djaghloul, C. Thieulin, R. Vargiolu, C. Pailler-Mattei and H. Zahouani, R. Soc. Open Sci., DOI:10.1098/rsos.170321.

      P.-H. Cornuault, L. Carpentier, M.-A. Bueno, J.-M. Cote and G. Monteil, J. R. Soc. Interface, DOI:10.1098/rsif.2015.0495.

      K. Qian, K. Traylor, S. W. Lee, B. Ellis, J. Weiss and D. Kamper, J. Biomech., 2014, 47, 3094– 3099.

      Y. Yuan and R. Verma, Colloids Surf. B Biointerfaces, 2006, 48, 6–12.

      Y.-J. Fu, H. Qui, K.-S. Liao, S. J. Lue, C.-C. Hu, K.-R. Lee and J.-Y. Lai, Langmuir, 2010, 26, 4392–4399.

      Comment 3, Part 2

      “The real finger has multiple layers with different moduli. In fact, the stratum corneum cells, which are the outer layer at the interface and determine the friction, have a much higher modulus than PDMS. The real finger has multiple layers with different moduli. In fact, the stratum corneum cells, which are the outer layer at the interface and determine the friction, have a much higher modulus than PDMS.

      We have approximated the softness of the finger with 100 kPa crosslinked PDMS, which is close to what has been reported for the bulk of a human fingertip(8,9). However, as mentioned in the Materials and Methods, there are two additional features of the mock finger that impart greater strength. The PDMS surrounds a rigid, acrylic bone comparable to the distal phalanx, which provides an additional layer of higher modulus(10). Additionally, the 8-hour UV-Ozone treatment decreases the viscoelastic tack of the pristine PDMS by glassifying, or further crosslinking the surface of the finger(11), therefore imparting greater stiffness at the surface similar to the contributions of the stratum corneum, along with a similar surface energy(12). This technique is widely used in wearables(13), soft robotics(14), and microfluidics(15) to induce both these material changes. Additionally, the finger is used at least a day after UV-Ozone treatment is completed in order for the surface to return to moderate hydrophilicity, similar to the outermost layer of human skin(16).

      Comment 3, Part 3

      In addition, the slanted position of the finger can cause non-uniform pressures across the finger. Both can contribute to making the PDMS finger have much more stick-slip than a real finger.

      To ensure that there is minimal contribution from the slanted position of the finger, an initial contact area of 1×1 cm is established before sliding and recording friction measurements. As the PDMS finger is a soft object, the portion in contact with a surface flattens and the contact area remains largely unchanged during sliding. Any additional stick-slip after this alignment step is caused by contact aging at the interface, but the first trace we collect is always discarded to only consider stick-slip events caused by surface chemistry. We recognize that it is difficult to completely control the pressure distribution due to the planar interface, but this is also expected when humans freely explore a surface.

      Comment 3, Part 4

      In fact, if you look at the regime maps, there is very little space that has steady sliding. This does not represent well human exploration of surfaces. We do not tend to use a force and velocity that will cause extensive stick-slip (frequent regions of 100% stick-slip) and, in fact, the speeds used in the study are on the slow side, which also contributes to more stick-slip. At higher speeds and lower forces, all of the materials had steady sliding regions.

      We are not aware of published studies that extensively show that humans avoid stickslip regimes. In fact, we are aware familiar with literature where stiction spike formation is suppressed – a recent paper by AliAbbasi, Basdogan et. al. investigates electroadhesion and friction with NaCl solution-infused interfaces, resulting in significantly steadier forces(17). We also directly showed evidence of instability formation that we observed during human exploration in Fig. 3B-C. These dynamic events are common, despite the lack of control of normal forces and sliding velocities. We also note that Reviewer 1, Comment 2, was suggesting that we further explore possible trends from parameterizing the stiction spike.

      We note that many studies have often not gone at the velocities and masses required for stiction spikes – even though these masses and velocities would be routinely seen in free exploration – this is usually due to constraints of equipment(18). Sliding events during human free exploration of surfaces can exceed 100 mm/s for rapid touches. However, for the surfaces investigated here, we observe that large regions of stick-slip can emerge at velocities as low as 5 mm/s depending on the applied load. The incidence of steady sliding appears more dependent on the applied mass, with almost no steady sliding observed at or above 75 g. Indeed, the force categorization along our transition zones is the main point of the paper.

      Comment 3, Part 5

      Further, on these very smooth surfaces, the friction and stiction are more complex and cannot dismiss considerations such as finger material property change with sweat pore occlusion and sweat capillary forces. Also, the vertical motion of both the PDMS finger and the instructed human subjects is not the motion that humans typically use to discriminate between surfaces.

      We did not describe the task sufficiently. Humans were only given the instruction to slide their finger along a single axis from top to bottom of a sample, not vertical as in azimuthal to gravity. We have updated our wording in the manuscript to reflect this.

      Our changes to the manuscript (Page 4)

      “Participants could touch for as long as they wanted, but were asked to only use their dominant index fingers along a single axis to better mimic the conditions for instability formation during mechanical testing with the mock finger.”

      (Page 11)

      “The participant was then asked to explore each sample simultaneously, and ran over each surface in strokes along a single axis until the participant could decide which of the two had “more friction”.”

      Comment 3, Part 6

      Finally, fingerprints may not affect the shape and size of the contact area, but they certainly do affect the dynamic response and detection of vibrations.

      We are aware of the nuance. Our previous work on the role of fingerprints on friction experienced by a PDMS mock finger showed enhanced signals with the incorporation of ridges on the finger and used a rate-and-state model of a heterogenous, elastic body to find corresponding trends (though there is no existing model of friction that can accurately model experiments on mesoscale friction)(7). The key conclusion was that a flat finger still preserved key dynamic features, and the presence of stronger or more vibrations could result in more similar forces for different surfaces depending on the sliding conditions.

      This is also in the context that we are seeking to provide a reasonable and experimentally accessible method to characterize surfaces, which will always be better as we get closer in replicating a true human finger. But our goal here was to replicate the finger sufficiently for use in human studies. We believe the more appropriate metric of success is if the mock finger is more successful than replacing traditional characterization experiments, like friction coefficient, roughness, surface energy, etc.

      Comment 4

      This all leads to the critical question, why are friction, normal force, and velocity not measured during the measured human exploration and in a systematic study using the real human finger? The authors posed an extremely interesting hypothesis that humans may alter their speed to feel the instability transition regions. This is something that could be measured with a real finger but is not likely to be correlated accurately enough to match regime boundaries with such a simplified artificial finger.

      We are excited that our manuscript offers a tractable manner to test the hypothesis that tactile decision-making models use friction instabilities as evidence. However, we lay out the challenges and barriers, and how the scope of this paper will lead us in that direction. We also clarify that our goals are to provide a method to characterize samples to better design tactile interfaces in haptics or in psychophysical experiments and raise awareness that the common methods of sample characterization in touch by an average friction coefficient or roughness is fundamentally unsound.

      In short, in our view, to further support our findings on instabilities would require answering:

      (1) Which one, or combination of, of the multiple swipes that people make responsible for a tactile decision? (The need for a decision-making model)

      (2) Establish what is, or may be, tactile evidence.

      (3) Establish tactile decision-making models are similar or different than existing decision-making models.

      (4) Test the hypothesis, in these models, that friction instabilities are evidence, and not some other unknown metric. This requires design samples that vary in the amount of evidence generated, but this evidence cannot be controlled directly. Rather, the samples indirectly vary evidence by how likely it is for a human to generate different types of friction instabilities during standard exploration.

      (5) Design a task that does not require the use of subjective tactile descriptors, like “which one feels rougher”, which we see cause confusion in participants, which will likely require accounting for memory effects.

      We elaborate these points below:

      To successfully perform this experiment, we note that freely exploring humans make multiple strokes on a surface. Therefore, we would need to construct a decision-making model. It has not yet been demonstrated whether tactile decision making follows visual decision making, but perhaps to start, we can assume it does. Then, in the design of our decision-making paradigm, we immediately run into the problem: What is tactile evidence?

      From Fig. 3C, we already can see that identifying evidence is challenging. Prior to this manuscript, people may have chosen the average force, or the highest force. Or we may choose the average friction force. Then, after deciding on the evidence, we need to find a method to manipulate the evidence, i.e., create samples or a machine that causes high friction, etc. We show that during the course of human touch, due to the dynamic nature of friction, the average can change a large amount and sample design becomes a central barrier to experiments. Others may suggest immobilizing the finger and applying a known force, but given how much friction changes with human exploration, there is no known method to make a machine recreate temporally and spatially varying friction forces during sliding onto a stationary finger. Finally, perhaps most importantly, in addition to mechanical challenges, a study by Liu, Colgate et al. showed that even if they recorded the friction (2D) of a finger exploring a surface and then replicated the same friction forces onto a finger, the participant could not determine which surface the replayed friction force was supposed to represent.1 This supports that the efference copy is important, that the forces in response to expected motion are important to determine friction. Finally, there is no known method to design instabilities a priori. They must be found through experiments. Especially since if we were to introduce, say a bump or a trough, then we bring in confounding variables to how participants tell surfaces apart.

      Furthermore, even if we had some consistent method to create tactile “evidence”, the paradigm also deserves some consideration. In our experience, the 3-AFC task we perform is important because the vocabulary for touch has not been established. That is, in 3-AFC, by asking to determine which one sample is unlike the others, we do not have to ask the participant questions like “which one is rougher” or “which one has less friction”. In contrast, 2-AFC, which is better for decision-making models because it does not include memory, requires the asking of a perceptual question like: “which one is rougher?”. In our ongoing work, taking two silane coatings, we found that participants could easily identify which surface is unlike the others above chance in a 3-AFC, but participants, even within their own trials, could not consistently identify one silane as perceptually “rougher” by 2-AFC. To us, this calls into question the validity of tactile descriptors, but is beyond the scope of this manuscript.

      This is not our only goal, but in the context of human exploration, in this manuscript here, we believed it was important to identify a mechanical parameter that was consistent with how humans explore surfaces, but was also a parameter that could characterize to some consistent property of a surface – irrespective of whether a human was touching it. We thought that designing human decision-making models and paradigms around the friction coefficient would not be successful.

      Given the scope of these challenges, we do not think it would be possible to establish these conceptual sequences in a single manuscript.

      Reviewer 2 (Public review):

      Summary:

      In this paper, the authors want to test the hypothesis that frictional instabilities rather than friction are the main drivers for discriminating flat surfaces of different sub-nanometric roughness profiles.

      They first produced flat surfaces with 6 different coatings giving them unique and various properties in terms of roughness (picometer scale), contact angles (from hydrophilic to hydrophobic), friction coefficient (as measured against a mock finger), and Hurst exponent.

      Then, they used those surfaces in two different experiments. In the first experiment, they used a mock finger (PDMS of 100kPA molded into a fingertip shape) and slid it over the surfaces at different normal forces and speeds. They categorized the sliding behavior as steady sliding, sticking spikes, and slow frictional waves by visual inspection, and show that the surfaces have different behaviors depending on normal force and speed. In a second experiment, participants (10) were asked to discriminate pairs of those surfaces. It is found that each of those pairs could be reliably discriminated by most participants.

      Finally, the participant's discrimination performance is correlated with differences in the physical attributes observed against the mock finger. The authors found a positive correlation between participants' performances and differences in the count of steady sliding against the mock finger and a negative correlation between participants' reaction time and differences in the count of stiction spikes against the mock finger. They interpret those correlations as evidence that participants use those differences to discriminate the surfaces.

      Strengths:

      The created surfaces are very interesting as they are flat at the nanometer scale, yet have different physical attributes and can be reliably discriminated.”

      We thank Reviewer 2 for their notes on our manuscript. The responses below address the reviewer’s comments and recommendations for revised work.

      Weaknesses:

      Comment 1

      In my opinion, the data presented in the paper do not support the conclusions. The conclusions are based on a correlation between results obtained on the mock finger and results obtained with human participants but there is no evidence that the human participants' fingertips will behave similarly to the mock finger during the experiment. Figure 3 gives a hint that the 3 sliding behaviors can be observed in a real finger, but does not prove that the human finger will behave as the mock finger, i.e., there is no evidence that the phase maps in Figure 1C are similar for human fingers and across different people that can have very different stiffness and moisture levels.

      The mechanical characterization conducted with the mock finger seeks to extract significant features of friction traces of a set of surfaces to use as predictors of tactile discriminability. The goal is to find a consistent method to characterize surfaces for use in tactile experiments that can be replicated by others and used prior to any human experiments. However, in the overall response and in a response to a similar comment by Reviewer 1, we also explain why we believe experiments on humans to establish this fact is not yet reasonable.

      Comment 2

      I believe that the authors collected the contact forces during the psychophysics experiments, so this shortcoming could be solved if the authors use the actual data, and show that the participant responses can be better predicted by the occurrence of frictional instabilities than by the usual metrics on a trial by trial basis, or at least on a subject by subject basis. I.e. Poor performers should show fewer signs of differences in the sliding behaviors than good performers.

      To fully implement this, a decision-making model is necessary because, as a counter example, a participant could have generated 10 swipes of SFW and 1 swipe of a Sp, but the Sp may have been the most important event for making a tactile decision. This type of scenario is not compatible with the analysis suggested — and similar counterpoints can be made for other types of seemingly straightforward analysis.

      While we are interested and actively working on this, the study here is critical to establish types of evidence for a future decision-making model. We know humans change their friction constantly during real exploration, so it is unclear which of these constantly changing values we should input into the decision making model, and the future challenges we anticipate are explained in Comment 1.

      Comment 3

      The sample size (10) is very small.

      We recognize that, with all factors being equal, this sample size is on the smaller end. However, we emphasize the degree of control of samples is far above typical, with minimal variations in sample properties such as surface roughness, and every sample for every trial was pristine. Furthermore, the sample preparation (> 300 individual wafers were used) and cost became a factor. Although not typically appropriate, and thus not included in the manuscript, a post-hoc power analysis for our 100 trials of our pair that was closest to chance, P4, (53%, closest to chance at 33%) showed a power of 98.2%, suggesting that the study was appropriately powered.

      Reviewer 2 (Recommendations for the authors):

      Comment 1

      Differences in SS and Sp (Table 2) are NOT physical or mechanical differences but are obtained by counting differences in the number of occurrences of each sliding behavior. It is rather a weird choice.

      We disagree that differences in SS and Sp are not physical or mechanical, as these are well-established phenomena in the soft matter and tribology literature(19-21). These are known as “mechanical instabilities” and generated due to the effects of two physical phenomena: the elasticity of the finger (which is constant in our mechanical testing) and the friction forces present (which change per sample type). The motivation behind using these different shapes is that the instabilities, in some conditions, can be invariant to external factors like velocity. This would be quite advantageous for human exploration because, unlike friction coefficient, which changes with nearly any factor, including velocity and mass, the instabilities being invariant to velocity would mean that we are accurately characterizing a unique identifier of the surface even though velocity may be variable.

      This “weird choice” is the central innovation of this paper. This choice was necessary because we demonstrated that the common usage of friction coefficient is fundamentally flawed: we see that friction coefficient suggests that surface which are more different would feel more similar – indeed the most distinctive surfaces would be two surfaces that are identical, which is clearly spurious. One potential explanation for why we were able to see this is effect is because our surfaces have similar (< 0.6 nm variability) roughness, removing potential confounding factors, and this type of low roughness control has not been used in tactile studies to the best of our knowledge.

      Comment 2

      Figures 2B-C: why are the x-data different than Table 2?

      The x-data in Fig. 2B-C are the absolute differences in the number of occurrences measured for a given instability type or material property out of 144 pulls. Modeling the human participant results in our GLMMs required the independent variables to be in this form rather than percentages. We initially chose to list percent differences in Table 2 to highlight the ranges of differences instead of an absolute value, but have added both for clarity.

      Our changes to the manuscript (Page 7)

      “To determine if humans can detect these three different instabilities, we selected six pairs of surfaces to create a broad range of potential instabilities present across all three types. These are summarized in Table 2, where the first column for each instability is the difference in occurrence of that instability formed between each pair, and the second is the percent difference.”

      Comment 3

      "We constructed a set of coated surfaces with physical differences which were imperceptible by touch but created different types of instabilities based on how quickly a finger is slid and how hard a human finger is pressed during sliding." Yet, in your experiment, participants could discriminate them, so this is incoherent.

      To clarify the point, macroscopic objects can differ in physical shape and in chemical composition. What we meant was that the physical differences, i.e., roughness, were below a limit (Skedung et al.) that participants, without a coating, would not be able to tell these apart(22). Therefore, the reason people could tell our surfaces apart was due to the chemical composition of the surface, and not any differences in roughness or physical effects like film stiffness (due to the molecular-scale thinness of the surface coatings, they are mechanically negligible). However, we concede that at the molecular scale, the traditional macroscopic distinction between physical and chemical is blurred.

      We have made minor revisions to the wording in the abstract. We clarify that the surface coatings had physical differences in roughness that were smaller than 0.6 nm, which based purely on roughness, would not be expected to be distinguishable to participants. Therefore, the reason participants can tell these surfaces apart is due to differences in friction generated by chemical composition, and we were able to minimize contributions from physical differences in the sample our study.

      Our changes to the manuscript (Page 1, Abstract)

      “We constructed a set of coated surfaces with minimal physical differences that by themselves, are not perceptible to people, but instead, due to modification in surface chemistry, the surfaces created different types of instabilities based on how quickly a finger is slid and how hard a human finger is pressed during sliding.”

      Reviewer 3 (Public review):

      Strengths:  

      The paper describes a new perspective on friction perception, with the hypothesis that humans are sensitive to the instabilities of the surface rather than the coefficient of friction. The paper is very well written and with a comprehensive literature survey.

      One of the central tools used by the author to characterize the frictional behavior is the frictional instabilities maps. With these maps, it becomes clear that two different surfaces can have both similar and different behavior depending on the normal force and the speed of exploration. It puts forward that friction is a complicated phenomenon, especially for soft materials.

      The psychophysics study is centered around an odd-one-out protocol, which has the advantage of avoiding any external reference to what would mean friction or texture for example. The comparisons are made only based on the texture being similar or not.

      The results show a significant relationship between the distance between frictional maps and the success rate in discriminating two kinds of surface.”

      We thank Reviewer 3 for their notes and interesting discussion points on our manuscript. Below, we address the reviewer’s feedback and comments on related works.

      Weaknesses:

      Comment 1

      The main weakness of the paper comes from the fact that the frictional maps and the extensive psychophysics study are not made at the same time, nor with the same finger. The frictional maps are produced with an artificial finger made out of PDMS which is a poor substitute for the complex tribological properties of skin.

      A similar comment was made by Reviewers 1 and 2 and parts are replicated below. We are not claiming that our PDMS fingers are superior to real fingers, but rather, we cannot establish standards in the field by using real human fingers that vary between subjects and researchers. We believe the mock finger we designed is a reasonable mimic of the human finger by matching surface energy, heterogeneous mechanical structure, and the ability to test multiple physiologically relevant pressures and sliding velocities.

      We achieve a heterogeneous mechanical structure with the 3 primary components of stiffness of a human finger. The effective modulus of ~100 kPa, from soft tissue,8,9 is obtained with a 30:1 ratio of PDMS to crosslinker. The PDMS also surrounds a rigid, acrylic bone comparable to the distal phalanx, which provides an additional layer of higher modulus.10 Additionally, the 8-hour UV-Ozone treatment decreases the viscoelastic tack of the pristine PDMS by glassifying, or further crosslinking the surface of the finger,11 therefore imparting greater stiffness at the surface similar to the contributions of the stratum corneum, along with a similar surface energy.12 The finger is used at least a day after UV-Ozone treatment is completed in order for the surface to return to moderate hydrophilicity, similar to the outermost layer of human skin.16 We also discuss the shape of the contact formed. To ensure that there is minimal contribution from the slanted position of the finger, an initial contact area of 1×1 cm is established before sliding and recording friction measurements. As the PDMS finger is a soft object, the portion in contact with a surface flattens and the contact area remains largely unchanged during sliding. We recognize that it is difficult to completely control the pressure distribution due to the planar interface, but this variation is also expected when humans freely explore a surface. Finally, we consider flat vs. fingerprinted fingers. Our previous work on the role of fingerprints on friction experienced by a PDMS mock finger showed enhanced signals with the incorporation of ridges on the finger and used a rate-andstate model of a heterogenous, elastic body to find corresponding trends.7 The key conclusion was that a flat finger still preserved key dynamic features, and the presence of stronger or more vibrations could result in more similar forces for different surfaces depending on the sliding conditions. We note that we have subsequently used the controlled mechanical data collected with this flat mock finger in correlations with human psychophysics in previous work, where findings from our mechanical experiments were predictive of human performance.3–6 Ultimately, we see from our prior work and here that, despite the drawbacks of our mock finger, it outperforms other standard characterization technique in providing information about the mesoscale that correlates to tactile perception. We have added these details to the manuscript.

      We also note that an intermediate option, replicating real fingers, even in a mold, may also inadvertently limit trends from characterization to a specific finger. One of the main – and severe – limitations of using a human finger is that all fingers are different, meaning any study focusing on a particular user may not apply to others or be recreated easily by other researchers. We cannot set a standard for replication around a real human finger as that participant may no longer be available, or willing to travel the world as a “standard”. Furthermore, the method in which a single person changes their pressures and velocities as they touch a surface is highly variable. We also note that in the Summary Response, we noted that a study by Colgate et al. (IEEE ToH 2024) demonstrated that efference copies may be important, and thus constraining a human finger and replaying the forces recorded during free exploration will not lead to the participant identifying a surface with any consistency. Thus, it is important to allow humans to freely explore surfaces, but creates nearly limitless variability in friction forces.

      This is also against the backdrop that we are seeking to provide a method to characterize surfaces, which will be aided as we get closer in replicate a true human finger. Indeed, the more features we replicate, the more successful the mechanical data will be in correlating to tactile distinguishability. But reasonably, our success would be in replacing traditional characterization experiments, not in recreating the forces of an arbitrary human finger.

      Our changes to the manuscript Added (Page 2-3)

      “Mock finger as a characterization tool

      In this work, we use a mechanical setup with a PDMS mock finger to derive tactile predictors from controlled friction traces alternative to average friction coefficients. While there is a tradeoff in selecting a synthetic finger over a more accurate, real human finger in modeling touch, our aim to design a method of mesoscale surface characterization for more successful studies on tactile perception cannot be fulfilled using one human participant as a standard. We believe that with sufficient replication of surface and bulk properties as well as contact geometry, and controlled friction measurements collected at loading conditions observed during a tactile discrimination task, we can isolate unique frictional features of a set of surfaces that do not arise from human-to-human variability.

      The major component of a human finger, by volume, is soft tissue (~56%)(22), resulting in an effective modulus close to 100 kPa(23,24). In order to achieve this same softness, we crosslink PDMS in a 1×1×5 cm mold at a 30:1 elastomer:crosslinker ratio. However, two more features impart increased stiffness in a human finger. Most of this added rigidity is derived from the bone at the fingertip, the distal phalanx(23-25), which we mimic with an acrylic bone within our PDMS network. The stratum corneum, the stiffer, glassier outer layer of skin(26), is replicated with the surface of the mock finger glassified, or further crosslinked, after 8 hours of UV-Ozone treatment(27). This treatment also modifies the surface properties of the native PDMS to align with those of a human finger more closely. It minimizes the viscoelastic tack at the surface, resulting in a comparable non-sticky surface. At least one day after treatment, the finger surface returns to moderate hydrophilicity (~60º), as is typically observed for a real finger(28).

      The initial contact area formed before a friction trace is collected is a rectangle of 1×1 cm. While this shape is not entirely representative of a human finger with curves and ridges, human fingers flatten out enough to reduce the effects of curvature with even very light pressures(28-30). This implies that regardless of finger pressure, the contact area is largely load-independent, which is more accurately replicated with a rectangular mock finger. It is still a challenge to control pressure distribution with this planar interface, but non-uniform pressures are also expected during human exploration.

      Lastly, we consider fingerprints vs. flat fingers. A key finding of our previous work is that while fingerprints enhanced frictional dynamics at certain conditions, key features were still maintained with a flat finger(7). Furthermore, for some loading conditions, the more amplified signals could also result in more similar friction traces for different surfaces. We have continued to use flat fingers in our mechanical experiments, and have observed good agreement between these friction traces and human experiments(7,8,21,31).”

      (Page 3-4, Materials and Methods)

      “Mock Finger Preparation

      Friction forces across all six surfaces were measured using a custom apparatus with a polydimethylsiloxane (PDMS, Dow Sylgard 184) mock finger that mimics a human finger’s

      mechanical properties and contact mechanics while exploring a surface relatively closely(7,8). PDMS and crosslinker were combined in a 30:1 ratio to achieve a stiffness of 100 kPa comparable to a real finger, then degassed in a vacuum desiccator for 30 minutes. We are aware that the manufacturer recommended crosslinking ratio for Sylgard 184 is 10:1 due to potential uncrosslinked liquid residues(32), but further crosslinking concentrated at the surface prevents this. The prepared PDMS was then poured into a 1×1×5 cm mold also containing an acrylic 3D-printed “bone” to attach applied masses on top of the “fingertip” area contacting a surface during friction testing. After crosslinking in the mold at 60ºC for 1 hour, the finger was treated with UV-Ozone for 8 hours out of the mold to minimize viscoelastic tack.  

      Mechanical Testing

      A custom device using our PDMS mock finger was used to collect macroscopic friction force traces replicating human exploration(7,8). After placing a sample surface on a stage, the finger was lowered at a slight angle such that an initial 1×1 cm rectangle of “fingertip” contact area could be established. We considered a broad range of applied masses (M \= 0, 25, 75, and 100 g) added onto the deadweight of the finger (6 g) observed during a tactile discrimination task. The other side of the sensor was connected to a motorized stage (V-508 PIMag Precision Linear Stage, Physikinstrumente) to control both displacement (4 mm across all conditions) and sliding velocity (v \= 5, 10, 25, and 45 mm s<sup>-1</sup>). Forces were measured at all 16 combinations of mass and velocity via a 250 g Futek force sensor (k \= 13.9 kN m<sup>-1</sup>) threaded to the bone, and recorded at an average sampling rate of 550 Hz with a Keithley 7510 DMM digitized multimeter. Force traces were collected in sets of 4 slides, discarding the first due to contact aging. Because some mass-velocity combinations were near the boundaries of instability phase transitions, not all force traces at these given conditions exhibited similar profiles. Thus, three sets were collected on fresh spots for each condition to observe enough occurrences of multiple instabilities, at a total of nine traces per combination for each surface.”

      Added References (Page 13)

      M. Murai, H.-K. Lau, B. P. Pereira and R. W. H. Pho, J. Hand Surg., 1997, 22, 935–941.

      A. Abdouni, M. Djaghloul, C. Thieulin, R. Vargiolu, C. Pailler-Mattei and H. Zahouani, R. Soc. Open Sci., DOI:10.1098/rsos.170321.

      P.-H. Cornuault, L. Carpentier, M.-A. Bueno, J.-M. Cote and G. Monteil, J. R. Soc. Interface, DOI:10.1098/rsif.2015.0495.

      K. Qian, K. Traylor, S. W. Lee, B. Ellis, J. Weiss and D. Kamper, J. Biomech., 2014, 47, 3094– 3099.

      Y. Yuan and R. Verma, Colloids Surf. B Biointerfaces, 2006, 48, 6–12.

      Y.-J. Fu, H. Qui, K.-S. Liao, S. J. Lue, C.-C. Hu, K.-R. Lee and J.-Y. Lai, Langmuir, 2010, 26, 4392–4399.

      Comment 2

      The evidence would have been much stronger if the measurement of the interaction was done during the psychophysical experiment. In addition, because of the protocol, the correlation is based on aggregates rather than on individual interactions.

      Our Response: We agree that this would have helped further establish our argument, but in the overall statement and in other reviewer responses, we describe the significant challenges to establishing this.

      To fully implement this, a decision-making model is necessary because, as a counter example, a participant could have generated 10 swipes of SFW and 1 swipe of a Sp, but the Sp may have been the most important event for making a tactile decision. We also clarify that our goals are to provide a method to characterize samples to better design tactile interfaces in haptics or in psychophysical experiments.

      In short, in our view, to develop a decision-making model, the challenges are as follows:

      (1) Which one, or combination of, of the multiple swipes that people make responsible for a tactile decision?

      (2) Establish what is, or may be, tactile evidence.

      (3) Establish tactile decision-making models are similar or different than existing decision-making models.

      (4) Test the hypothesis, in these models, that friction instabilities are evidence, and not some other unknown metric.

      (5) Design a task that does not require the use of subjective tactile descriptors, like “which one feels rougher”, which we see cause confusion in participants, which will likely require accounting for memory effects.

      (6) Design samples that vary in the amount of evidence generated, but this evidence cannot be controlled directly. Rather, the samples indirectly vary evidence by how likely it is for a human to generate different types of friction instabilities during standard exploration.

      We elaborate these points below:

      To successfully perform this experiment, we note that freely exploring humans make multiple strokes on a surface. Therefore, we would need to construct a decision-making model. It has not yet been demonstrated whether tactile decision making follows visual decision making, but perhaps to start, we can assume it does. Then, in the design of our decision-making paradigm, we immediately run into the problem: What is tactile evidence?

      From Fig. 3C, we already can see that identifying evidence is challenging. Prior to this manuscript, people may have chosen the average force, or the highest force. Or we may choose the average friction force. Then, after deciding on the evidence, we need to find a method to manipulate the evidence, i.e., create samples or a machine that causes high friction, etc. We show that during the course of human touch, due to the dynamic nature of friction, the average can change a large amount and sample design becomes a central barrier to experiments. Others may suggest to immobilize the finger and applying a known force, but given how much friction changes with human exploration, there is no known method to make a machine recreate temporally and spatially varying friction forces during sliding onto a stationary finger. Finally, perhaps most importantly, in addition to mechanical challenges, a study by Liu, Colgate et al. showed that even if they recorded the friction (2D) of a finger exploring a surface and then replicated the same friction forces onto a finger, the participant could not determine which surface the replayed friction force was supposed to represent.1 This supports that the efference copy is important, that the forces in response to expected motion are important to determine friction. Finally, there is no known method to design instabilities a priori. They must be found through experiments, especially since if we were to introduce, say a bump or a trough, then we bring in confounding variables to how participants tell surfaces apart.

      Furthermore, even if we had some consistent method to create tactile “evidence”, the paradigm also deserves some consideration. In our experience, the 3-AFC task we perform is important because the vocabulary for touch has not been established. That is, in 3-AFC, by asking to determine which one sample is unlike the others, we do not have to ask the participant questions like “which one is rougher” or “which one has less friction”. In contrast, 2-AFC, which is better for decision-making models because it does not include memory, requires the asking of a perceptual question like: “which one is rougher?”. In our ongoing work, taking two silane coatings, we found that participants could easily identify which surface is unlike the others above chance in a 3-AFC, but participants, even within their own trials, could not consistently identify one silane as perceptually “rougher” by 2-AFC. To us, this calls into question the validity of tactile descriptors, but is beyond the scope of the current manuscript.

      This is not our only goal, but in the context of human exploration, in this manuscript here, we believed it was important to identify a mechanical parameter that was consistent with how humans explore surfaces, but was also a parameter that could characterize to some consistent property of a surface – irrespective of whether a human was touching it. We thought that designing human decision-making models and paradigms around the friction coefficient would not be successful.

      Given the scope of these challenges, we do not think it would be possible to establish this conceptual sequence in a single manuscript.

      Comment 3

      The authors compensate with a third experiment where they used a 2AFC protocol and an online force measurement. But the results of this third study, fail to convince the relation.

      With this experiment, our central goal was to demonstrate that the instabilities we have identified with the PDMS finger also occur with a human finger. Several instances of SS, Sp, and SFW were recorded with this setup as a participant touched surfaces in real time.

      Comment 4

      No map of the real finger interaction is shown, bringing doubt to the validity of the frictional map for something as variable as human fingers.

      Real fingers change constantly during exploration, and friction is state-dependent, meaning that the friction will depend on how the person was moving the moment prior. Therefore, a map is only valid for a single human movement – even if participants all were instructed to take a single swipe and start from zero motion, humans are unable to maintain constant velocities and pressures. Clearly, this is not sustainable for any analysis, and these drawbacks apply to any measured parameter, whether instabilities suggested here, or friction coefficients used throughout. We believe the difficulty of this approach emphasizes why a standard map of characterization of a surface by a mock finger, even with its drawbacks, is a viable path forward.

      Reviewer 3 (Recommendations for the authors):

      Comment 1

      It would be interesting to comment on a potential connection between the frictional instability maps and Schalamack waves

      Schallamach waves are a subset of slow frictional waves (SFW). Schallmach waves are very specifically defined. They are a are pockets of air that form between a soft sliding object and rigid surface, and propagate rear-to-front (retrograde waves) as a soft object is slid and buckles due to adhesive pinning. Wrinkles form at the detached portion of the soft material, until the interface reattaches and the process repeats.23 There is typically a high burden of proof to establish a Schallamach wave over a more general slow frictional wave. We note that it would be exceeding difficult to design samples that can reliably create subsets of SFW, but we are aware that this may be an interesting question at a future point in our work.

      Comment 2

      The force sensors look very compliant, and given the dynamic nature of the signal, it is important to characterize the frequency response of the system to make sure that the fluctuations are not amplified.

      Our Response: Thank you for noticing. We mistyped the sensor spring constant as 13.9 N m<sup>-1</sup> instead of kN m<sup>-1</sup>. However, below we show how the instabilities are derived from the mechanics at the interface due to the compliance of the finger. The “springs” of the force sensor and PDMS finger are connected in parallel. Since k<sub>sensor</sub> = 13.9 kN m<sup>-1</sup>, the spring constant of the system overall reflects the compliance of the finger, and highlights the oscillations arising solely from stick-slip. A sample calculation is shown below.

      Author response image 1.

      Fitting a line to the initial slope of the force trace for C6 gives the equation y = 25.679_x_ – 0.2149. The slope here represents force data over time data, and is divided by the velocity (25 mm/s) to determine 𝐹𝐹 the spring constant of the system . This value is lower than ksensor = 13.9 kN/m, indicating that the “springs” representing the force sensor and PDMS finger are connected in parallel: . The finger is the compliant component of the system, with k<sub>finger</sub> = 0.902 N/m, and of course, real human fingers are also compliant so this matches our goals with the design of the mock finger.

      Our changes to the manuscript (Page 4)

      (k \= 13.9 kN m<sup>-1</sup>)

      Comment 3

      The authors should discuss about the stochastic nature of friction:

      Wiertlewski, Hudin, Hayward, IEEE WHC 2011

      Greenspon, McLellan, Lieber, Bensmaia, JRSI 2020”

      We believe that, given the references, this comment on “stochastic” refers to the macroscopically-observable fluctuations (i.e., the mechanical “noise” which is not due to instrument noise) in friction arising from the discordant network of stick-slip phenomena occurring throughout the contact zone, and not the stochastic nature of nanoscale friction that occurs thermal fluctuations nor due to statistical distributions in bond breaking associated with soft contact.

      We first note that our small-scale fluctuations do not arise from a periodic surface texture that dominates in the frequency regime. However, even on our comparatively smooth surfaces, we do expect fluctuations due to nanoscale variation in contact, generation of stick-slip across at microscale length scales that occur either concurrently or discordantly across the contact zone, and the nonlinear dependence of friction to nearly any variation in state and composition(7).

      Perhaps the most relevant to the manuscript is that a major advantage of analysis by friction is that it sidesteps these ever-present microscale fluctuations, leading to more clearly defined classifiers or categories during analysis. Wiertlewski et. al. showed repeated measurements in their systems ultimately gave rise to consistent frequencies(24) (we think their system was in a steady sliding regime and the patterning gave rise to underlying macroscopic waves). These consistent frequencies, at least in soft systems and absent obvious macroscopic patterned features, would be expected to arise from the instability categories and we see them throughout.

      Comment 4

      It is stated that "we observed a spurious, negative correlation between friction coefficient and accuracy”.

      What makes you qualify that correlation as spurious?

      We mean this as in the statistical definition of “spurious”.

      This correlation would indicate that by the metric of friction coefficient, more different surfaces are perceived more similarly. Thus, two very different surfaces, like Teflon and sandpaper, by friction coefficient would be expected to feel very similar. Two nearly identical surfaces would be expected to feel very different – but of course, humans cannot consistently distinguish two identical surfaces. This finding is counterintuitive and refutes that friction coefficient is a reliable classifier of surfaces by touch. We do not think it is productive to determine a mechanism for a spurious correlation, but perhaps one reason we were able to observe this is because our study, to the best of our knowledge, is unique for having samples that are controlled in their physical differences in roughness and surface features.

      Our changes to the manuscript (Page 10)

      “To compare the value of looking at frictional instabilities, we also performed GLMM fits on common approaches in the field, like a friction coefficient or material property typically used in tactile discrimination, shown in Fig. 2D-E. Interestingly, in Fig. 2D, we observed a spurious, negative correlation between friction coefficient (typically and often problematically simplified as across all tested conditions) and accuracy (r = -0.64, p < 0.01); that is, the more different the surfaces are by friction coefficient, the less people can tell them apart. This spurious correlation would be the opposite of intuition, and further calls into question the common practice of using friction coefficients in touch-related studies. The alternative, two-term model which includes adhesive contact area for friction coefficient(29) was even less predictive (see Fig. S6A of SI). We believe such a correlation could not have been uncovered previously as our samples are minimal in their physical variations. Yet, the dynamic changes in force even within a single sample are not considered, despite being a key feature of mesoscale friction during human touch.

      We investigate different material properties in Fig. 2E. Differences in average roughness R<sub>a</sub> (or other parameters, like root mean square roughness R<sub>rms</sub> (Fig. S6A of SI) did not show a statistically significant correlation to accuracy. Though roughness is a popular parameter, correlating any roughness parameter to human performance here could be moot: the limit of detecting roughness differences has previously been defined as 13 nm on structured surfaces(33) and much higher for randomly rough surfaces(46), all of which are magnitudes larger than the roughness differences between our surfaces. The differences in contact angle hysteresis – as an approximation of the adhesion contributions(47) – do not present any statistically significant effects on performance.”

      Comment 5

      The authors should comment on the influence of friction on perceptual invariance. Despite inducing radially different frictional behavior for various conditions, these surfaces are stably perceived. Maybe this is a sign that humans extract a different metric?

      We agree – we are excited that frictional instabilities may offer a more stable perceptual cue because they are not prone to fluctuations (Recommendations for the authors, Comment 3) and instability formation, in many conditions, is invariant to applied pressures and velocities – thus forming large zones where a human may reasonable encounter a given instability.

      Raw friction is highly prone to variation during human exploration (in alignment with Recommendations for the authors, Comment 3), but ongoing work seeks to explain tactile constancy, or the ability to identify objects despite these large changes in force. Very recently published work by Fehlberg et. al. identified the role of modulating finger speed and normal force in amplifying the differences in friction coefficient between materials in order to identify them(25), and we postulate that their work may be streamlined and consistent with the idea of friction instabilities, though we have not had a chance to discuss this in-depth with the authors yet.

      We think that the instability maps show a viable path forward to how surfaces are stably perceived, and instabilities themselves show a potential mechanism: mathematically, instabilities for given conditions can be invariant to velocity or mass, creating zones where a certain instability is encountered. This reduces the immense variability of friction to a smaller, more stable classification of surfaces (e.g., a 30% SS surface or a 60% SS surface). A given surface will typically produce the same instability at a specific condition (we found some boundaries are extremely condition sensitive, but many conditions are not), whereas a single friction trace which is highly prone to variation is not a stable metric.

      Added References (Page 14)

      53 M. Fehlberg, E. Monfort, S. Saikumar, K. Drewing and R. Bennewitz, IEEE Trans. Haptics, 2024, 17, 957–963.

      References

      Z. Liu, J.-T. Kim, J. A. Rogers, R. L. Klatzky and J. E. Colgate, IEEE Trans. Haptics, 2024, 17, 441– 450.

      D. Gueorguiev, S. Bochereau, A. Mouraux, V. Hayward and J.-L. Thonnard, Sci Rep, 2016, 6, 25553.

      C. W. Carpenter, C. Dhong, N. B. Root, D. Rodriquez, E. E. Abdo, K. Skelil, M. A. Alkhadra, J. Ramírez, V. S. Ramachandran and D. J. Lipomi, Mater. Horiz., 2018, 5, 70–77.

      A. Nolin, A. Licht, K. Pierson, C.-Y. Lo, L. V. Kayser and C. Dhong, Soft Matter, 2021, 17, 5050– 5060.

      A. Nolin, K. Pierson, R. Hlibok, C.-Y. Lo, L. V. Kayser and C. Dhong, Soft Matter, 2022, 18, 3928– 3940.

      Z. Swain, M. Derkaloustian, K. A. Hepler, A. Nolin, V. S. Damani, P. Bhattacharyya, T. Shrestha, J. Medina, L. Kayser and C. Dhong, J. Mater. Chem. B, DOI:10.1039/D4TB01646G.

      C. Dhong, L. V. Kayser, R. Arroyo, A. Shin, M. Finn, A. T. Kleinschmidt and D. J. Lipomi, Soft Matter, 2018, 14, 7483–7491.

      A. Abdouni, M. Djaghloul, C. Thieulin, R. Vargiolu, C. Pailler-Mattei and H. Zahouani, Royal Society Open Science, DOI:10.1098/rsos.170321.

      P.-H. Cornuault, L. Carpentier, M.-A. Bueno, J.-M. Cote and G. Monteil, Journal of The Royal Society Interface, DOI:10.1098/rsif.2015.0495.

      K. Qian, K. Traylor, S. W. Lee, B. Ellis, J. Weiss and D. Kamper, J Biomech, 2014, 47, 3094–3099.

      Y.-J. Fu, H. Qui, K.-S. Liao, S. J. Lue, C.-C. Hu, K.-R. Lee and J.-Y. Lai, Langmuir, 2010, 26, 4392– 4399.

      Y. Yuan and R. Verma, Colloids Surf B Biointerfaces, 2006, 48, 6–12.

      G. Yu, J. Hu, J. Tan, Y. Gao, Y. Lu and F. Xuan, Nanotechnology, 2018, 29, 115502.

      L. Zheng, S. Dong, J. Nie, S. Li, Z. Ren, X. Ma, X. Chen, H. Li and Z. L. Wang, ACS Appl. Mater. Interfaces, 2019, 11, 42504–42511.

      K. Ma, J. Rivera, G. J. Hirasaki and S. L. Biswal, Journal of Colloid and Interface Science, 2011, 363, 371–378.

      A. Mavon, H. Zahouani, D. Redoules, P. Agache, Y. Gall and Ph. Humbert, Colloids and Surfaces B: Biointerfaces, 1997, 8, 147–155.

      E. AliAbbasi, M. Muzammil, O. Sirin, P. Lefèvre, Ø. G. Martinsen and C. Basdogan, IEEE Trans. Haptics, 2024, 17, 841–849.

      G. Corniani, Z. S. Lee, M. J. Carré, R. Lewis, B. P. Delhaye and H. P. Saal, eLife, DOI:10.7554/eLife.93554.1.

      J. N. Israelachvili, Intermolecular and Surface Forces, Academic Press, 2011.

      S. Das, N. Cadirov, S. Chary, Y. Kaufman, J. Hogan, K. L. Turner and J. N. Israelachvili, J R Soc Interface, 2015, 12, 20141346.

      B. N. J. Persson, O. Albohr, C. Creton and V. Peveri, The Journal of Chemical Physics, 2004, 120, 8779–8793.

      L. Skedung, M. Arvidsson, J. Y. Chung, C. M. Stafford, B. Berglund and M. W. Rutland, Sci Rep, 2013, 3, 2617.

      K. Viswanathan, N. K. Sundaram and S. Chandrasekar, Soft Matter, 2016, 12, 5265–5275.

      M. Wiertlewski, C. Hudin and V. Hayward, in 2011 IEEE World Haptics Conference, 2011, pp. 25– 30.

      M. Fehlberg, E. Monfort, S. Saikumar, K. Drewing and R. Bennewitz, IEEE Transactions on Haptics, 2024, 17, 957–963.

    1. Author response:

      Public Reviews:<br /> Reviewer #1 (Public review):

      Summary:

      The manuscript discusses the role of phosphorylated ubiquitin (pUb) by PINK1 kinase in neurodegenerative diseases. It reveals that elevated levels of pUb are observed in aged human brains and those affected by Parkinson's disease (PD), as well as in Alzheimer's disease (AD), aging, and ischemic injury. The study shows that increased pUb impairs proteasomal degradation, leading to protein aggregation and neurodegeneration. The authors also demonstrate that PINK1 knockout can mitigate protein aggregation in aging and ischemic mouse brains, as well as in cells treated with a proteasome inhibitor. While this study provided some interesting data, several important points should be addressed before being further considered.

      Strengths:

      (1) Reveals a novel pathological mechanism of neurodegeneration mediated by pUb, providing a new perspective on understanding neurodegenerative diseases.

      (2) The study covers not only a single disease model but also various neurodegenerative diseases such as Alzheimer's disease, aging, and ischemic injury, enhancing the breadth and applicability of the research findings.

      Weaknesses:

      (1) PINK1 has been reported as a kinase capable of phosphorylating Ubiquitin, hence the expected outcome of increased p-Ub levels upon PINK1 overexpression. Figures 5E-F do not demonstrate a significant increase in Ub levels upon overexpression of PINK1 alone, whereas the evident increase in Ub expression upon overexpression of S65A is apparent. Therefore, the notion that increased Ub phosphorylation leads to protein aggregation in mouse hippocampal neurons is not yet convincingly supported.

      Indeed, overexpression of sPINK1* alone caused little change in Ub levels in the soluble fraction (Figure 5E), which is expected. Ub in the soluble fraction is in a relatively stable, buffered state. However, overexpression of sPINK1* resulted in an increase in Ub levels in the insoluble fraction, indicating protein aggregation. The molecular weight of Ub in the insoluble fraction was predominantly below 70 kDa, implying that phosphorylation inhibits Ub chain elongation.

      To further examine this, we used the Ub/S65A mutant to antagonize Ub phosphorylation, and found that the aggregation at low molecular weight was significantly reduced, indicating a partial restoration of proteasomal activity. The increase in Ub levels in both the soluble and insoluble fractions likely results from the high rate of ubiquitination driven by the elevated levels of Ub. Notably, the overexpressed Ub/S65A was detected in the Western blot using the wild-type Ub antibody, which accounts for the apparently increased Ub level.

      When overexpressing Ub/S65E, we again saw an increase in Ub levels in the insoluble fraction (but no increase in the soluble fraction), with low molecular weight bands even more prominent than those observed with sPINK1* transfection. These findings collectively support the conclusion that sPINK1* promotes protein aggregation through Ub phosphorylation.

      (2) The specificity of PINK1 and p-Ub antibodies requires further validation, as a series of literature indicate that the expression of the PINK1 protein is relatively low and difficult to detect under physiological conditions.

      We acknowledge the challenges in achieving optimal specificity for commercially available and custom-generated antibodies targeting PINK1 and pUb, particularly given the low endogenous levels of these proteins under physiological conditions. Despite these limitations, we observed robust immunofluorescent staining for PINK1 (Figures 1A, 1C, and 1G) and pUb (Figures 1B, 1D, and 1G) in human brain samples from Alzheimer's disease (AD) patients, as well as in mouse brains from models of AD and cerebral ischemia. The significant elevation of PINK1 and pUb under these pathological conditions likely accounts for the clear visualization. To validate antibody specificity, we have included images from pink1-/- mice as negative controls in the revised manuscript (Figure 1C and 1D, third panel).

      In addition, we detected a significant increase in pUb levels in aged mouse brains compared to young ones (Figures 1E and 1F). Notably, in pink1-/- mice, pUb levels remained unchanged between young and aged groups, despite some background signal, further supporting the conclusion that pUb accumulation during aging is PINK1-dependent.

      In HEK293 cells, pink1-/- cells served as a negative control for PINK1 (Figure 2B and 2C) and for pUb (Figure 2D and 2E). While the Western blot using the pUb antibody displayed some nonspecific background, pUb levels in pink1-/- cells remained unchanged across all MG132 treatment conditions (Figures 2D and 2E), further attesting the reliability of our findings.

      (3) In Figure 6, relying solely on Western blot staining and Golgi staining under high magnification is insufficient to prove the impact of PINK1 overexpression on neuronal integrity and cognitive function. The authors should supplement their findings with immunostaining results for MAP2 or NeuN to demonstrate whether neuronal cells are affected.

      Thank you for raising this important point. We included NeuN immunofluorescent staining in Figure 5—figure supplement 2 of the original manuscript. The results demonstrate a significant loss of NeuN-positive cells in the hippocampus following Ub/S65E overexpression, while no apparent change in NeuN-positive cells was observed with sPINK1* transfection alone. These findings provide evidence of neuronal loss in response to Ub/S65E, further supporting the impact of pUb elevation on neuronal integrity.

      While we did not perform MAP2 immunostaining, we included complementary analyses to assess neuronal integrity. Specifically, we performed Western blotting to determine MAP2 protein levels and used Golgi staining to study neuronal morphology and synaptic structure in greater detail. These analyses revealed that overexpression of sPINK1* or Ub/S65E decreased MAP2 levels and caused damage to synaptic structures (Figures 6F and 6H). Importantly, the deleterious effects of sPINK1* overexpression could be rescued by co-expression of Ub/S65A, further underscoring the role of pUb in mediating these changes.

      Together, our NeuN immunostaining, MAP2 analysis, and Golgi staining provide strong evidence for the impact of PINK1 overexpression and pUb elevation on neuronal integrity and synaptic health. We believe these complementary approaches sufficiently address the reviewer’s concern and highlight the pathological consequences of elevated pUb levels.

      (4) The authors should provide more detailed figure captions to facilitate the understanding of the results depicted in the figures.

      Figure captions will be updated with more details in the revised manuscript.

      (5) While the study proposes that pUb promotes neurodegeneration by affecting proteasomal function, the specific molecular mechanisms and signaling pathways remain to be elucidated.

      The specific molecular mechanisms and signaling pathways through which pUb promotes neurodegeneration are likely multifaceted and interconnected. Mitochondrial dysfunction appears to be a central contributor to neurodegeneration following sPINK1* overexpression. This is supported by (1) an observed increase in full-length PINK1, indicative of impaired mitochondrial quality control, and (2) proteomic data revealing enhanced mitophagy at 30 days post-transfection and substantial mitochondrial injury by 70 days post-transfection. The progressive damage to mitochondria caused by protein aggregates can cause further neuronal injury and degeneration.

      In addition, reduced proteasomal activity may result in the accumulation of inhibitory proteins that are normally degraded by the ubiquitin-proteasome system. Our proteomics analysis identified a >54-fold increase in CamK2n1 (UniProt ID: Q6QWF9), an endogenous inhibitor of CaMKII activation, following sPINK1* overexpression. This is particularly significant because the accumulation of CamK2n1 could suppress CaMKII activation and, subsequently, inhibit the CREB signaling pathway (illustrated below). As CREB is essential for synaptic plasticity and neuronal survival, its inhibition may further amplify neurodegenerative processes.

      While our study identifies proteasomal dysfunction and mitochondrial damage as key initial triggers, downstream effects—such as disruptions in signaling pathways like CaMKII-CREB—likely contribute to a broader cascade of pathological events. These findings highlight the complexity of pUb-mediated neurodegeneration and suggest that further exploration of downstream mechanisms is necessary to fully elucidate the pathways involved.

      We plan to include the proteomics data, in the revised manuscript, of mouse brain tissues at 30 days and 70 days post-transfection, to further highlight this downstream effect upon proteasomal dysfunction.

      Author response image 1.

      Reviewer #2 (Public review):

      Summary:

      The manuscript makes the claim that pUb is elevated in a number of degenerative conditions including Alzheimer's Disease and cerebral ischemia. Some of this is based on antibody staining which is poorly controlled and difficult to accept at this point. They confirm previous results that a cytosolic form of PINK1 accumulates following proteasome inhibition and that this can be active. Accumulation of pUb is proposed to interfere with proteostasis through inhibition of the proteasome. Much of the data relies on over-expression and there is little support for this reflecting physiological mechanisms.

      Weaknesses:

      The manuscript is poorly written. I appreciate this may be difficult in a non-native tongue, but felt that many of the problems are organisational. Less data of higher quality, better controls and incision would be preferable. Overall the referencing of past work is lamentable.

      Methods are also very poor and difficult to follow.<br /> Until technical issues are addressed I think this would represent an unreliable contribution to the field.

      (1) Antibody specificity and detection under pathological conditions

      We acknowledge the limitations of commercially available antibodies for detecting PINK1 and pUb. Despite these challenges, our findings demonstrate a significant increase in PINK1 and pUb levels under pathological conditions, such as Alzheimer's disease (AD) and ischemia. Additionally, we observed an increase in pUb level during brain aging, further highlighting its relevance in this particular physiological process. To ensure reliable quantification of PINK1 and pUb levels, we used pink1-/- mice and HEK293 cells as negative controls. For example, PINK1 levels were extremely low in control cells but increased dramatically after 2 hours of oxygen-glucose deprivation (OGD) and 6 hours of reperfusion (Figure 1H). Together, these controls validate that the observed elevations in PINK1 and pUb are specific and linked to pathological or certain physiological conditions.

      (2)  Overexpression as a model for pathological conditions

      To investigate whether the inhibitory effects of sPINK1* on the ubiquitin-proteasome system (UPS) are dependent on its kinase activity, we utilized a kinase-dead version of sPINK1* as a negative control. Since PINK1 has multiple substrates, we further explored whether its effects on UPS inhibition were mediated specifically by ubiquitin phosphorylation. For this, we used Ub/S65A (a phospho-null mutant) to antagonize Ub phosphorylation by sPINK1*, and Ub/S65E (a phospho-mimetic mutant) to mimic phosphorylated Ub. These well-defined controls ensured the robustness of our conclusions.

      While overexpression does not perfectly replicate physiological conditions, it serves as a valuable model for studying pathological scenarios such as neurodegeneration and brain aging, where pUb levels are known to increase. For example, we observed a 30.4% increase in pUb levels in aged mouse brains compared to young brains (Figure 1F). Similarly, in our sPINK1* overexpression model, pUb levels increased by 43.8% and 59.9% at 30- and 70-days post-transfection, respectively, compared to controls (Figures 5A and 5C). Notably, co-expression of sPINK1* with Ub/S65A almost entirely prevented sPINK1* accumulation (Figure 5B), indicating that an active UPS can efficiently degrade sPINK1*. Collectively, these findings show that sPINK1* accumulation inhibits UPS activity, a defect that can be rescued by the phospho-null Ub mutant. Thus, this overexpression model closely mimics pathological conditions and offers valuable insights into pUb-mediated proteasomal dysfunction.

      (3) Organization of the manuscript

      We believe the structure of the manuscript is justified and systematically addresses the key aspects of the study in a logic flow:

      (a) Evidence for the increase of PINK1 and pUb in multiple pathological and physiological conditions.

      (b) Identification of the sources and consequences of sPINK1 and pUb elevation.

      (c) Mechanistic insights into how pUb inhibits UPS-mediated degradation.

      (d) Validation of these findings using pink1-/- mice and cells.

      (e) Evidence of the reciprocal relationship between proteasomal inhibition and pUb elevation, culminating in neurodegeneration.

      (f) Demonstration of elevated pUb levels and protein aggregation in the hippocampus following sPINK1* overexpression, supported by proteomic analyses, behavioral tests, Western blotting, and Golgi staining.

      Thus, this organization provides a clear and cohesive narrative, culminating in the demonstration that sPINK1* overexpression induces hippocampal neuron degeneration.

      (4) Revisions to writing, referencing, and methodology

      We will improve the clarity and flow of the manuscript, add more references to properly acknowledge prior work, and incorporate additional details into the Methods section to enhance readability and reproducibility. These improvements should address the organizational and technical concerns raised, while strengthen the overall quality of the manuscript.

      Reviewer #3 (Public review):

      Summary:

      This study aims to explore the role of phosphorylated ubiquitin (pUb) in proteostasis and its impact on neurodegeneration. By employing a combination of molecular, cellular, and in vivo approaches, the authors demonstrate that elevated pUb levels contribute to both protective and neurotoxic effects, depending on the context. The research integrates proteasomal inhibition, mitochondrial dysfunction, and protein aggregation, providing new insights into the pathology of neurodegenerative diseases.

      Strengths:

      - The integration of proteomics, molecular biology, and animal models provides comprehensive insights.

      - The use of phospho-null and phospho-mimetic ubiquitin mutants elegantly demonstrates the dual effects of pUb.

      - Data on behavioral changes and cognitive impairments establish a clear link between cellular mechanisms and functional outcomes.

      Weaknesses:

      - While the study discusses the reciprocal relationship between proteasomal inhibition and pUb elevation, causality remains partially inferred.

      The reciprocal cycle between proteasomal inhibition and pUb elevation can be initiated by various factors that impair proteasomal activity. These factors include Aβ accumulation, ATP depletion, reduced expression of proteasome components, and covalent modifications of proteasomal subunits—all well-established contributors to the progressive decline in proteasome function. Once initiated, this cycle would become self-perpetuating, with the accumulation of sPINK1 and pUb driving a feedback loop of deteriorating proteasomal activity.

      In the current study, this reciprocal relationship between sPINK1/pUb elevation and proteasomal dysfunction is depicted in Figure 4A. Our results demonstrate that increased sPINK1 or PINK1 levels, such as through overexpression, can initiate this cycle. Crucially, co-expression of Ub/S65A effectively rescues the cells from this cycle, highlighting the pivotal role of pUb in driving proteasomal inhibition and establishing causality in this relationship. At the animal level, pink1 knockout could prevent protein aggregation upon aging and cerebral ischemia (Figures 1E and 1G).

      Mitochondrial injury is a likely source of elevated PINK1 and pUb levels. A recent study showed that efficient mitophagy is necessary to prevent pUb accumulation (bioRxiv 2023.02.14.528378), suggesting that mitochondrial damage can trigger this cycle. In another study (bioRxiv 2024.07.03.601901), the authors found that mitochondrial damage could enhance PINK1 transcription, further increasing cytoplasmic PINK1 levels and exacerbating the cycle.

      - The role of alternative pathways, such as autophagy, in compensating for proteasomal dysfunction is underexplored.

      Elevated sPINK1 has been reported to enhance autophagy (Autophagy 2016, 12: 632-647), potentially compensating for the impaired UPS. One mechanism involves the phosphorylation of p62 by sPINK1, which enhances autophagy activity. In our study, we did observe increased autophagic activity upon sPINK1* overexpression, as shown in Figure 2I (middle panel, without BALA). This increased autophagy may help degrade ubiquitinated proteins induced by puromycin, partially compensating for the proteasomal dysfunction.

      This compensation might explain why protein aggregation only increased slightly, though statistically significant, at 70 days post sPINK1* transfection (Figure 5F). Additionally, we observed a slight, though statistically insignificant, increase in LC3II levels in the hippocampus of mouse brains at 70 days post sPINK1* transfection (Figure 5—figure supplement 6), further supporting the notion of autophagy activation.

      However, while autophagy may provide some compensation, its effect is likely limited. Autophagy and UPS differ significantly in their roles and mechanisms of degradation. Autophagy is a bulk degradation pathway that is generally non-selective, targeting long-lived proteins, damaged organelles, and intracellular pathogens. In contrast, the UPS is highly selective, primarily degrading short-lived regulatory proteins, misfolded proteins, and proteins tagged for degradation.

      Together, we found that sPINK1* overexpression enhanced autophagy-mediated protein degradation while simultaneously impairing UPS-mediated degradation. This suggests that while autophagy may provide partial compensation for proteasomal dysfunction, it is not sufficient to fully counterbalance the selective degradation functions of the UPS.

      - The immunofluorescence images in Figure 1A-D lack clarity and transparency. It is not clear whether the images represent human brain tissue, mouse brain tissue, or cultured cells. Additionally, the DAPI staining is not well-defined, making it difficult to discern cell nuclei or staging. To address these issues, lower-magnification images that clearly show the brain region should be provided, along with improved DAPI staining for better visualization. Furthermore, the Results section and Figure legends should explicitly indicate which brain region is being presented. These concerns raise questions about the reliability of the reported pUb levels in AD, which is a critical aspect of the study's findings.

      We will include low-magnification images in the supplementary figures of the revised manuscript to provide a broader context for the immunofluorescence data presented in Figure 1. DAPI staining at higher magnifications will also be provided to improve visualization of cell nuclei and overall tissue structure. Additionally, we will indicate the brain regions examined in the corresponding figure legends, and incorporate more details in the Results section to provide clearer descriptions of the samples and brain regions analyzed.

      The human brain samples presented in Figure 1 are from the cingulate gyrus region of Alzheimer's disease (AD) patients. Our analysis revealed that PINK1 is primarily localized within cell bodies, while pUb is more abundant around Aβ plaques, likely in nerve terminals. These additional clarifications and supplementary figures should provide greater transparency and improve the reliability of our findings.

      - Figure 4B should also indicate which brain region is being presented.

      The images were taken for layer III-IV in the neocortex of mouse brains, which information will be incorporated in the figure legend of the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      Drawing on insights from preceding studies, the researchers pinpointed mutations within the spag7 gene that correlate with metabolic aberrations in mice. The precise function of spag7 has not been fully described yet, thereby the primary objective of this investigation is to unravel its pivotal role in the development of obesity and metabolic disease in mice. First, they generated a mice model lacking spag7 and observed that KO mice exhibited diminished birth size, which subsequently progressed to manifest obesity and impaired glucose tolerance upon reaching adulthood. This behaviour was primarily attributed to a reduction in energy expenditure. In fact, KO animals demonstrated compromised exercise endurance and muscle functionality, stemming from a deterioration in mitochondrial activity. Intriguingly, none of these effects was observed when using a tamoxifen-induced KO mouse model, implying that Spag7's influence is predominantly confined to the embryonic developmental phase. Explorations within placental tissue unveiled that mice afflicted by Spag7 deficiency experienced placental insufficiency, likely due to aberrant development of the placental junctional zone, a phenomenon that could impede optimal nutrient conveyance to the developing fetus. Overall, the authors assert that Spag7 emerges as a crucial determinant orchestrating accurate embryogenesis and subsequent energy balance in the later stages of life.

      The study boasts several noteworthy strengths. Notably, it employs a combination of animal models and a thorough analysis of metabolic and exercise parameters, underscoring a meticulous approach. Furthermore, the investigation encompasses a comprehensive evaluation of fetal loss across distinct pregnancy stages, alongside a transcriptomic analysis of skeletal muscle, thereby imparting substantial value. However, a pivotal weakness of the study centres on its translational applicability. While the authors claim that "SPAG7 is well-conserved with 97% of the amino acid sequence being identical in humans and mice", the precise role of spag7 in the human context remains enigmatic. This limitation hampers a direct extrapolation of findings to human scenarios. Additionally, the study's elucidation of the molecular underpinnings behind the spag7-mediated anomalous development of the placental junction zone remains incomplete. Finally, the hypothesis positing a reduction in nutrient availability to the fetus, though intriguing, requires further substantiation, leaving an aspect of the mechanism unexplored.

      Hence, in order to fortify the solidity of their conclusions, these concerns necessitate meticulous attention and resolution in the forthcoming version of the manuscript. Upon the comprehensive addressing of these aspects, the study is poised to exert a substantial influence on the field, its significance reverberating significantly. The methodologies and data presented undoubtedly hold the potential to facilitate the community's deeper understanding of the ramifications stemming from disruptions during pregnancy, shedding light on their enduring impact on the metabolic well-being of subsequent generations.

      Thanks to this reviewer for their thoughtful analysis and commentary. Human mutations in SPAG7 are exceedingly rare (SPAG7 | pLoF (genebass.org)), potentially because of the deleterious effects of SPAG7-deficiency on prenatal development. This makes investigation into the causative effects of SPAG7 in humans challenging. There exist mutations in the SPAG7 region of the genome that are associated with BMI, but no direct coding variants within the spag7 gene itself have been studied.

      We agree with the reviewer that the precise role of spag7 in the placenta remains unknown. However, given its robust expression and high protein levels in the placenta, including in key cells, such as the syncytiotrophoblast (https://www.proteinatlas.org/ENSG00000091640-SPAG7/tissue/Placenta), it is highly likely that spag7 is critical for normal placenta development and function. Multiple studies (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9716072/) have recently shown that sperm associated RNAs play a critical role in embryonic and early placenta development. Our findings will provide the basis for future studies that can elucidate the role of spag7 in human placenta.

      Reviewer #2 (Public Review):

      Summary: The authors of this manuscript are interested in discovering and functionally characterizing genes that might cause obesity. To find such genes, they conducted a forward genetic screen in mice, selecting strains which displayed increased body weight and adiposity. They found a strain, with germ-line deficiency in the gene Spag7, which displayed significantly increased body weight, fat mass, and adipose depot sizes manifesting after the onset of adulthood (20 weeks). The mice also display decreased organ sizes, leading to decreased lean body mass. The increased adiposity was traced to decreased energy expenditure at both room temperature and thermoneutrality, correlating with decreased locomotor activity and muscle atrophy. Major metabolic abnormalities such as impaired glucose tolerance and insulin sensitivity also accompanied the phenotype. Unexpectedly, when the authors generated an inducible, whole body knockout mouse using a globally expressed Cre-ERT2 along with a globally floxed Spag7, and induced Spag7 knockout before the onset of obesity, none of the phenotypes seen in the original strain were recapitulated. The authors trace this discrepancy to the major effect of Spag7 being on placental development.

      Strengths: Strengths of the manuscript are its inherently unbiased approach, using a forward genetic screen to discover previously unknown genes linked to obesity phenotypes. Another strong aspect of the work was the generation of an independent, complementary, strain consisting of an inducible knockout model, in which the deficiency of the gene could be assessed in a more granular form. This approach enabled the discovery of Spag7 as a gene involved in the establishment of the mature placenta, which determines the metabolic fate of the offspring. Additional strengths include the extensive array of physiological parameters measured, which provided a deep understanding of the whole-body metabolic phenotype and pinpointed its likely origin to muscle energetic dysfunction.

      Weaknesses: Weaknesses that can be raised are the lack of molecular mechanistic understanding of the numerous phenotypic observations. For example, the specific role of Spag7 to promote placental development remains unclear. Also, the reason why placental developmental abnormalities lead to muscle dysfunction, and whether indeed the entire metabolic phenotype of the offspring can be attributed solely to decreased muscle energetics is not fully explored.

      Overall, the authors achieved a remarkable success in identifying genes associated with development of obesity and metabolic disease, discovering the role of Spag7 in placental development, and highlighting the fundamental role of in-utero development in setting future metabolic state of the offspring.

      We thank this reviewer for their thoughtful analysis and commentary. Significant effort has been made to understand the causes of the metabolic phenotypes observed in SPAG7-deficient mouse models. It is clear that hyperphagia is not the cause and the muscle energetics deficit is likely not the sole cause. We expect that decreased access to nutrition in utero will lead to widespread and varied metabolic adaptation.

      We agree with the reviewer that further work can be done to understand the molecular mechanism driving the metabolic phenotypes of SPAG7-deficient animals. We believe that full investigation of the processes behind the developmental abnormalities is beyond the scope of this paper and best to be done under a separate paper.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Flaherty III S.E. et al identified SPAG7 gene in their forward mutagenetic screening and created the germline knockout and inducible knockout mice. The authors reported that the SPAG7 germline knockout mice had lower birth weight likely due to intrauterine growth restriction and placental insufficiency. The SPAG7 KO mice later developed obesity phenotype as a result of reduced energy expenditure. However, the inducible SPAG7 knockout mice had normal body weight and composition.

      Strengths:

      In this reviewer's opinion, this study has high significance in the field of metabolic research for the following reasons.

      (1) The authors' findings are significant in the field of obesity research, especially from the perspective of maternal-fetal medicine. The authors created and analyzed the SPAG7 KO mice and found that the KO mice had a "thrifty phenotype" and developed obesity.

      (2) SPAG7 gene function hasn't been thoroughly studied. The reported phenotype will fill the gap of knowledge.

      Overall, the authors have presented their results in a clear and logically organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings.

      Weaknesses:

      The manuscript can be further strengthened with more clarification on the following points.

      1) The germline whole-body KO mice were female mice (Line293), however the inducible knockout mice were male mice (Line549). Sexual dimorphism is often observed in metabolic studies, therefore the metabolic phenotype of both female and male mice needs to be reported for the germline and inducible knockouts in order to make the justified conclusion.

      We thank the reviewer for their thoughtful analysis and commentary. All inducible KO animals described in the paper are female (the typo in Line 549 has been corrected). We did perform studies in both male and female animals for both of these lines. Males display similar metabolic phenotypes, though not as robustly as the females. A table summarizing key data from male and female germline KO animals and inducible KO animals has been included in Author response table 1.

      Author response table 1.

      2) SPAG7 has an NLS. Does this protein function in gene expression? Whether the overall metabolic phenotype is the direct cause of SPAG7 ablation is unclear. For example, the Hsd17b10 gene was downregulated in all tissues in the KO mice. Could this have been coincidentally selected for and thus be the cause of the developmental issues and adulthood obesity? Do the iSpag7 mice demonstrate reduced expression of Hsd17b10?

      SPAG7 contains an R3H domain, which is predicted to bind polynucleotides, and other proteins that contain R3H domains are known to bind RNA or ssDNA. The iSPAG7 mice do display decreased hsd17b10 expression (to a lesser degree than the germline KOs) in the tissues examined. When we knock-down SPAG7 in specific tissues, we also see hsd17b10 expression decrease specifically in those tissues. These data all suggest that hsd17b10 expression is, at least, linked to spag7 expression. They also raise the question of why these animals have no metabolic phenotype. Some possible explanations are that hsd17b10 expression is essential only during early development, or that the lower magnitude of downregulation of hsd17b10 in the iSPAG7 is insufficient to produce the metabolic phenotypes seen in the germline Kos with higher magnitude of downregulation.

      3) Figure 2c should display the energy expenditure normalized to body weight (or lean body mass).

      How best to normalize total energy expenditure data is a subject of debate within the energy expenditure field. As the animals have increased body weight and decreased lean mass, normalizing to either will skew the results in different directions. We have included the data normalized to body weight and to lean mass in Author response image 1. The decrease in total energy expenditure remains significant in either scenario.

      Author response image 1.

      4) Please provide more information for the figure legend, including the statistical test that was conducted for each data set, animal numbers for each genotype and sexes.

      This information has been added to all figures.

      5) The authors should report how long after treatment the data was collected for figures 4F-M.

      Weeks after treatment have been added to the figure legends for Figures 4F-M.

      6) The authors should justify ending the data collection after 8 weeks for the iSPAG7 mice in Figures 4C-E. In the WT vs germline KO mice, there was no clear difference in body weight or lean mass at 15 weeks of age.

      Highly significant changes in fat mass, glucose tolerance and insulin sensitivity are already present in the germline SPAG7 KO mice at age of 15 week or earlier. Tamoxifen injection effectively induced SPA7 gene KO in less than a week in the iSPAG7 KO mice. Given the absence of significant changes or any trends towards significance in glucose and insulin tolerance test as well as other metabolic testes in the iSPAG7 KO mice at age of 15 week (same age as the germline KO when these changes observed) and 8 week after SPAG7 gene KO, we did not anticipate to see the changes beyond this point and decided to stop the study at 9 weeks after treatment.

    1. Author Response:

      This work presents valuable information about the specificity and promiscuity of toxic effector and immunity protein pairs. The evidence supporting the claims of the authors is currently incomplete, as there is concern about the methodology used to analyze protein interactions, which did not take potential differences in expression levels, protein folding, and/or transient interaction into account. Other methods to measure the strength of interactions and structural predictions would improve the study. The work will be of interest to microbiologists and biochemists working with toxin-antitoxin and effector-immunity proteins.

      We thank the reviewers for considering this manuscript. We agree that this manuscript provides a valuable and cross-discipline introduction to new EI pair protein families where we focus on the EI pair’s flexibility and impacts on community structure. As such, we believe we have provided a solid foundation for future studies to examine non-cognate interactions and their possible effects on microbial communities. This, by definition, leaves some areas “incomplete” and, therefore, open for further investigations. While the methods we show do take into account potential differences in binding assays, we will more explicitly address how “expression, protein folding, and/or transient binding” may play into this expanded EI pair model upon revision and temper the discussion of the proposed model. We have responded to the reviewers’ public comments (italicized below).

      Public Reviews:

      Note: Reviewer 1, who appeared to focus on a subset of the manuscript rather than the whole, based their comments on several inaccuracies, which we discuss below. We found the tone in this reviewer's comments to be, at times, inappropriate, e.g., using "harsh" and "simply too drastic" to imply that common structure-function analyses were outside of the field-standard methods. We also note that the reviewer took a somewhat atypical step in reviewing this manuscript by running and analyzing the potential protein-complex data in AlphaFold2 but did not discuss areas of low confidence within that model that may contradict their conclusions. We are concerned their approach muddled valid scientific criticisms with problematic conclusions.

      Reviewer #1 (Public Review):

      In this manuscript, Knecht, Sirias et al describe toxin-immunity pair from Proteus mirabilis. Their observations suggest that the immunity protein could protect against non-cognate effectors from the same family. They analyze these proteins by dissecting them into domains and constructing chimeras which leads them to the conclusion that the immunity can be promiscuous and that the binding of immunity is insufficient for protective activity.

      Strengths:

      The manuscript is well written and the data are very well presented and could be potentially interesting. The phylogenetic analysis is well done, and provides some general insights.

      Weaknesses:

      1) Conclusions are mostly supported by harsh deletions and double hybrid assays. The later assays might show binding, but this method is not resolutive enough to report the binding strength. Proteins could still bind, but the binding might be weaker, transient, and out-competed by the target binding.

      The phrasing of structure-function analyses as “harsh” is a bit unusual, as other research groups regularly use deletions and hybrid studies. Given the known caveats to deletion and domain substitutions, we included point-mutation analyses for both the effector and immunity proteins, as found on lines 105 - 113 and 255 - 261 in the current manuscript. These caveats are also why we coupled the in vitro binding analyses with in vivo protection experiments in two distinct experimental systems (E. coli and P. mirabilis). Based on this manuscript’s introductory analysis (where we define and characterize the genes, proteins, interactions, phylogenetics, and incidences in human microbiomes), the next apparent questions are beyond the scope of this study. Future approaches would include analyzing purified proteins from these effector (E) and immunity (I) protein families using biochemical assays, such as X-ray crystallography, circular dichroism spectroscopy, among others.

      (Interestingly, most papers in the EI field do not measure EI protein affinity (Jana et al., 2019, Yadav et al., 2021). Notable exceptions are earlier colicin research (Wallis et al., 1995) and a new T6SS EI paper (Bosch et al., 2023) published as we submitted this manuscript.)

      2) While the authors have modeled the structure of toxin and immunity, the toxin-immunity complex model is missing. Such a model allows alternative, more realistic interpretation of the presented data. Firstly, the immunity protein is predicted to bind contributing to the surface all over the sequence, except the last two alpha helices (very high confidence model, iPTM>0.8). The N terminus described by the authors contributes one of the toxin-binding surfaces, but this is not the sole binding site. Most importantly, other parts of the immunity protein are predicted to interact closer to the active site (D-E-K residues). Thus, based on the AlphaFold model, the predicted mechanism of immunization remains physically blocking the active site. However, removing the N terminal part, which contributes large interaction surface will directly impact the binding strength. Hence, the toxin-immunity co-folding model suggests that proper binding of immunity, contributed by different parts of the protein, is required to stabilize the toxin-immunity complex and to achieve complete neutralization. Alternative mechanisms of neutralization might not be necessary in this case and are difficult to imagine for a DNAse.

      In response to the reviewer’s comment, we again reviewed the RdnE-RdnI AlphaFold2 complex predictions with the most updated version of ColabFold (1.5.2-patch with PDB100 and MMseq2) and have included them at the end of the responses [1].

      However, the literature reports that computational predictions of E-I complexes often do not match experimental structural results (Hespanhol et al., 2022, Bosch et al., 2023). As such, we chose not to include the predicted cognate and non-cognate RdnE-I complexes from ColabFold (which uses AlphaFold2) and will not include this data in revised manuscripts. (It is notable that reviewer 1 found the proposed expanded model and research so interesting as to directly input and examine the AI-predicted RdnE-RdnI protein interactions in AlphaFold2.)

      Discussion of the prevailing toxin-immunity complex model is in the introduction (lines 45-48) and Figure 5E. Further, there are various known mechanisms for neutralizing nucleases and other T6SS effectors, which we briefly state in the discussion (lines 359 - 361). More in-depth, these molecular mechanisms include active-site blocking (Benz et al., 2012), allosteric-site binding (Kleanthous et al., 1999 and Lu et al., 2014), enzymatic neutralization of the target (Ting et al., 2021), and structural disruption of both the active and binding sites (Bosch et al., 2023). Given this diversity of mechanisms, we did not presume to speculate on the as-of-yet unknown mechanism of RdnI protection.

      3) Dissection of a toxin into two domains is also not justified from a structural point of view, it is probably based on initial sequence analyses. The N terminus (actually previously reported as Pone domain in ref 21) is actually not a separate domain, but an integral part of the protein that is encased from both sides by the C terminal part. These parts might indeed evolve faster since they are located further from the active site and the central core of the protein. I am happy to see that the chimeric toxins are active, but regarding the conservation and neutralization, I am not surprised, that the central core of the protein fold is highly conserved. However, "deletion 2" is quite irrelevant - it deletes the central core of the protein, which is simply too drastic to draw any conclusions from such a construct - it will not fold into anything similar to an original protein, if it will fold properly at all.

      The reviewer’s comment highlights why we turned to the chimera proteins to dissect the regions of RdnE (formerly IdrD-CT), as the deletions could result in misfolded proteins. (We initially examined RdnE in the years before the launch of AlphaFold2.) However, the reviewer is incorrect regarding the N-terminus of RdnE. The PoNe domain, while also a subfamily of the PD-(D/E)XK superfamily, forms a distinct clade of effectors from the PD-(D/E)XK domain in RdnE (formally IdrD-CT) as seen in Hespanhol et al., 2022; this is true for other DNAse effectors as well. Many studies analyzing effectors within the PD-(D/E)XK superfamily only focus on the PD-(D/E)XK domain, removing just this domain from the context of the whole protein (Hespanhol et al., 2022; Jana et al., 2019). Of note, in RdnE, this region alone (containing the DNA-binding domain) is insufficient for DNAse activity (unlike in PoNe).

      4) Regarding the "promiscuity" there is always a limit to how similar proteins are, hence when cross-neutralization is claimed authors should always provide sequence similarities. This similarity could also be further compared in terms of the predicted interaction surface between toxin and immunity.

      Reviewer 1 points out a fundamental property of protein-protein interactions that has been isolated away from the impacts of such interactions on bacterial community structure. We have provided the whole protein alignments in supplemental figure 3, the summary images in Figure 3D, and the protein phylogenetic trees in Figure 3C. We encourage others to consider the protein alignments as percent amino acid sequence similarity is not necessarily a good gauge for protein function and interactions. RuBisCo is one example of how protein sequence similarity can be small while functions remain highly conserved. These data are publicly available on the OSF website associated with this manuscript https://osf.io/scb7z/, and we hope the community explores the data there.

      In consideration of the enthusiasm to deeply dive into the primary research data, we have included the pairwise sequence identities across the entire proteins here: Proteus RdnI vs. Rothia RdnI: 23.6%; Proteus RdnI vs. Prevotella RdnI: 16.3%, Proteus RdnI vs. Pseudomonas RdnI: 14.6%; Rothia RdnI vs. Prevotella RdnI: 22.4%, Rothia RdnI vs. Pseudomonas RdnI: 17.6%; Prevotella RdnI vs. Pseudomonas RdnI: 19.5%. (As stated in response to reviewer 1 comment 2, we do not find it appropriate to make inferences based on AlphaFold2-predicted protein complexes.)

      Overall, it looks more like a regular toxin-immunity couple, where some cross-reactions with homologues are possible, depending on how far the sequences have deviated. Nevertheless, taking all of the above into account, these results do not challenge toxin-immunity specificity dogma.

      In this manuscript, we did not intend to dismiss the E-I specificity model but rather point out its limitations and propose an important expansion of that model that accounts for cross-protection and survival against attacks from other genera. We agree that it is commonly considered that deviations in amino acid sequence over time could result in cross-binding and protection (see lines 364-368). However, the impacts of such cross-binding on community structure, bacterial survival, and strain evolution have rarely been considered or addressed in prior literature, with exceptions such as in Zhang et al., 2013 and Bosch et al., 2023. One key insight we propose and show in this manuscript is that cross-binding can be a fitness benefit in mixed communities; therefore, it could be selected for evolutionarily (lines 378-380), even potentially in host microbiomes.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Knecht et al entitled "Non-cognate immunity proteins provide broader defenses against interbacterial effectors in microbial communities" aims at characterizing a new type VI secretion system (T6SS) effector immunity pair using genetic and biochemical studies primarily focused on Proteus mirabilis and metagenomic analysis of human-derived data focused on Rothia and Prevotella sequences. The authors provide evidence that RdnE and RdnI of Proteus constitute an E-I pair and that the effector likely degrades nucleic acids. Further, they provide evidence that expression of non-cognate immunity derived from diverse species can provide protection against RdnE intoxication. Overall, this general line of investigation is underdeveloped in the T6SS field and conceptually appropriate for a broad audience journal. The paper is well-written and, aside from a few cases, well-cited. As detailed below however, there are several aspects of this paper where the evidence provided is somewhat insufficient to support the claims. Further, there are now at least two examples in the literature of non-cognate immunity providing protection against intoxication, one of which is not cited here (Bosch et al PMID 37345922 - the other being Ting et al 2018). In general therefore I think that the motivating concept here in this paper of overturning the predominant model of interbacterial effector-immunity cognate interactions is oversold and should be dialed back.

      We agree that analyses focusing on flexible non-cognate interactions and protection are underdeveloped within the T6SS field and are not fully explored within a community structure. These ideas are rapidly growing in the field, as evidenced by the references provided by the reviewer. As stated earlier, we did not intend to overturn the prevailing model but rather propose an expanded model that accounts for protection against attacks from foreign genera.

      Strengths:

      One of the major strengths of this paper is the combination of diverse techniques including competition assays, biochemistry, and metagenomics surveys. The metagenomic analysis in particular has great potential for understanding T6SS biology in natural communities. Finally, it is clear that much new biology remains to be discovered in the realm of T6SS effectors and immunity.

      Weaknesses:

      The authors have not formally shown that RdnE is delivered by the T6SS. Is it the case that there are not available genetics tools for gene deletion for the BB2000 strain? If there are genetic tools available, standard assays to demonstrate T6SS-dependency would be to interrogate function via inactivation of the T6SS (e.g. by deleting tssC).

      Our research group showed that the T6SS secretes RdnE (previously IdrD) in Wenren et al., 2013 (cited in lines 71-73). We later confirmed T6SS-dependent secretion by LC-MS/MS (Saak et al., 2017).

      For swarm cross-phyla competition assays (Figure 4), at what level compared to cognate immunity are the non-cognate immunity proteins being expressed? This is unclear from the methods and Figure 4 legend and should be elaborated upon. Presumably these non-cognate immunity proteins are being overexpressed. Expression level and effector-to-immunity protein stoichiometry likely matters for interpretation of function, both in vitro as well as in relevant settings in nature. It is important to assess if native expression levels of non-cognate cross-phyla immunity (e.g. Rothia and Prevotella) protect similarly as the endogenously produced cognate immunity. This experiment could be performed in several ways, for example by deleting the RdnE-I pair and complementing back the Rothia or Prevotella RdnI at the same chromosomal locus, then performing the swarm assay. Alternatively, if there are inducible expression systems available for Proteus, examination of protection under varying levels of immunity induction could be an alternate way to address this question. Western blot analysis comparing cognate to non-cognate immunity protein levels expressed in Proteus could also be important. If the authors were interested in deriving physical binding constants between E and various cognate and non-cognate I (e.g. through isothermal titration calorimetry) that would be a strong set of data to support the claims made. The co-IP data presented in supplemental Figure 6 are nice but are from E. coli cells overexpressing each protein and do not fully address the question of in vivo (in Proteus) native expression.

      P. mirabilis strain ATCC29906 does not encode the rdnE and rdnI genes on the chromosome (NCBI BioSample: SAMN00001486) (line 151). Production of the RdnI proteins, including the cognate Proteus RdnI, comes from equivalent transgenic expression vectors. Specifically, the rdnI genes were expressed under the flaA promoter in P. mirabilis strain ATCC29906 (Table 1) for the swarm competition assays found in Figure 2C and Figure 4. This promoter results in constitutive expression in swarming cells (Belas et al., 1991; Jansen et al., 2003).

      Lines 321-324, the authors infer differences between E and I in terms of read recruitment (greater abundance of I) to indicate the presence of orphan immunity genes in metagenomic samples (Figure 5A-D). It seems equally or perhaps more likely that there is substantial sequence divergence in E compared to the reference sequence. In fact, metagenomes analyzed were required only to have "half of the bases on reference E-I sequence receiving coverage". Variation in coverage again could reflect divergent sequence dipping below 90% identity cutoff. I recommend performing metagenomic assemblies on these samples to assess and curate the E-I sequences present in each sample and then recalculating coverage based on the exact inferred sequences from each sample.

      This comment raises the challenges with metagenomic analyses. It was difficult to balance specificity to a particular species’ DNA sequence with the prevalence of any homologous sequence in the sample. Given the distinction in binding interactions among the examined four species, we opted to prioritize specificity, accepting that we were losing access to some rdnE and rdnI sequences in that decision. We chose a 90% identity cutoff, which, through several in silica controls, ensured that each sequence we identified was the rdnE or rdnI gene from that specific species. For the Version of Record, we will revisit this decision and consider trying to account for sequence divergence by lowering the identity cutoffs as suggested.

      A description of gene-level read recruitment in the methods section relating to metagenomic analysis is lacking and should be provided.

      Noted. We will also include the raw code and sequences on the OSF website associated with this manuscript https://osf.io/scb7z/.

      Reviewer #3 (Public Review):

      [...] Strengths:

      The authors presented a strong rationale in the manuscript and characterized the molecular mechanism of the RdnE effector both in vitro and in the heterologous expression model. The utilization of the bacterial two-hybrid system, along with the competition assays, to study the protective action of RdnI immunity is informative. Furthermore, the authors conducted bioinformatic analyses throughout the manuscript, examining the primary sequence, predicted structural, and metagenomic levels, which significantly underscore the significance and importance of the EI pair.

      Weaknesses:

      1. The interaction between RdnI and RdnE appears to be complex and requires further investigation. The manuscript's data does not conclusively explain how RdnI provides a "promiscuous" immunity function, particularly concerning the RdnI mutant/chimera derivatives. The lack of protection observed in these cases might be attributed to other factors, such as a decrease in protein expression levels or misfolding of the proteins. Additionally, the transient nature of the binding interaction could be insufficient to offer effective defenses.

      Yes, we agree with the reviewer and hope that grant reviewers’ share this colleague’s enthusiasm for understanding the detailed molecular mechanisms of RdnE-RdnI binding across genera. We will continue to emphasize such caveats as the next frontier is clearly understanding the molecular mechanisms for RdnI cognate or non-cognate protection. We address the concerns regarding expression levels in the response to reviewer 2, comment 2.

      1. The results from the mixed population competition lack quantitative analysis. The swarm competition assays only yield binary outcomes (Yes or No), limiting the ability to obtain more detailed insights from the data.

      The mixed swam assay is needed when studying T6SS effectors that are primarily secreted during Proteus’ swarming activity (Saak et al. 2017, Zepeda-Rivera et al. 2018). This limitation is one reason we utilize in vitro, in vivo, and bioinformatic analyses. Though the swarm competition assay yields a binary outcome, we are confident that the observed RdnI protection is due to interaction with a trans-cell RdnE via an active T6SS. By contrast, many manuscripts report co-expression of the EI pair (Yadev et al., 2021, Hespanhol et al., 2022) rather than secreted effectors, as we have achieved in this manuscript.

      1. The discovery of cross-species protection is solely evident in the heterologous expression-competition model. It remains uncertain whether this is an isolated occurrence or a common characteristic of RdnI immunity proteins across various scenarios. Further investigations are necessary to determine the generality of this behavior.

      We agree, which is why we submitted this paper as a launching point for further investigations into the generality of non-cognate interactions and their potential impact on community structure.

      Comments from Reviewing Editor:

      • In addition to the references provided by reviewer#2, the first manuscript to show non-cognate binding of immunity proteins was Russell et al 2012 (PMID: 22607806).
      • IdrD was shown to form a subfamily of effectors in this manuscript by Hespanhol et al 2022 PMID: 36226828 that analyzed several T6SS effectors belonging to PDDExK, and it should be cited.

      We appreciate that the reviewer and eLife staff pointed out missed citations. A revised manuscript will incorporate those studies and cite them appropriately.

      [1] The Proteus RdnE in complex with either the Prevotella or Pseudomonas RdnI showed low confidence at the interface (pIDDT ~50-70%); this AI-predicted complex might support the lack of binding seen in the bacterial two-hybrid assay. On the other hand, the Proteus and Rothia RdnI N-terminal regions show higher confidence at the interface with RdnE. Despite this, the C-terminus of the Proteus RdnI shows especially low confidence (pIDDT ~50%) where it might interact near RdnE’s active site (as suggested by reviewer 1). Given this low confidence and the already stated inaccuracies of AI-generated complexes, we would rather wait for crystallization data to inform potential protection mechanisms of RdnI.

      Author response image 1.

    1. Author response:

      Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      The authors focused on medaka retinal organoids to investigate the mechanism underlying the eye cup morphogenesis. The authors succeeded to induce lens formation in fish retinal organoids using 3D suspension culture with minimal growth factor-containing media containing the Hepes. At day 1, Rx3:H2B-GFP+ cells appear in the surface region of organoids. At day 1.5, Prox1+cells appear in the interface area between the organoid surface and the core of central cell mass, which develops a spherical-shaped lens later. So, Prox1+ cells covers the surface of the internal lens cell core. At day 2, foxe3:GFP+ cells appear in the Prox1+ area, where early lens fiber marker, LFC, starts to be expressed. In addition, foxe3:GFP+ cells show EdU+ incorporation, indicating that foxe3:GFP+ cells have lens epithelial cell-characters. At day 4, cry:EGFP+ cells differentiate inside the spherical lens core, whose the surface area consists of LFC+ and Prox1+ cells. Furthermore, at day 4, the lens core moves towards the surface of retinal organoids to form an eye-cup like structure, although this morphogenesis "inside out" mechanism is different from in vivo cellular "outside -in" mechanism of eye cup formation. From these data, the authors conclude that optic cup formation, especially the positioning of the lens, is established in retinal organoids though the different mechanism of in vivo morphogenesis.

      Overall, manuscript presentation is nice. However, there are still obscure points to understand background mechanism. My comments are shown below.

      Major comments

      (1) At the initial stage of retinal organoid morphogenesis, a spherical lens is centrally positioned inside the retinal organoids, by covering a central lens core by the outer cell sheet of retinal precursor cells. I wonder if the formation of this structure may be understood by differential cell adhesive activity or mechanical tension between lens core cells and retinal cell sheet, just like the previous study done by Heisenberg lab on the spatial patterning of endoderm, mesoderm and ectoderm (Nat. Cell Biol. 10, 429 - 436 (2008)). Lens core cells may be integrated inside retinal cell mass by cell sorting through the direct interaction between retinal cells and lens cells, or between lens cells and the culture media. After day 1, it is also possible to understand that lens core moves towards the surface of retinal organoids, if adhesive/tensile force states of lens core cells may be change by secretion of extracellular matrix. I wonder if the authors measure physical property, adhesive activity and solidness, of retinal precursor cells and lens core cells. If retinal organoids at day 1 are dissociated and cultured again, do they show the same patterning of internal lens core covering by the outer retinal cell sheet?

      The question, whether different adhesive activity is involved in cell sorting and lens formation is indeed very intriguing. To address this point, we will include additional experiment (see Revision Plan, experiment 1). This experiment will be based on the dissociation and re-aggregation of lens-forming organoids as suggested by the reviewer. To monitor cell type specific sorting, we will employ a lens progenitor reporter line Foxe3::GFP and the retina-specific Rx2::H2B-RFP. If different adhesive activities of lens and retinal progenitor cells are involved and drive the process of cell sorting, dissociation and re-aggregation will result in cell sorting based on their identity. 

      (2) Optic cup is evaginated from the lateral wall of neuroepithelium of the diencephalon. In zebrafish, cell movement occurs from the pigment epithelium to the neural retina during eye morphogenesis in an FGF-dependent manner. How the medaka optic cup morphogenesis is coordinated? I also wonder if the authors conduct the tracking of cell migration during optic cup morphogenesis to reveal how cell migration and cell division are regulated in lens of the Medaka retinal organoids. It is also interesting to examine how retinal cell movement is coordinated during Medaka retinal organoids.

      Looking into the detail of how optic cup-looking tissue arrangement of ocular organoids is achieved on cellular level is of course interesting. Our previous study showed that optic vesicles of medaka retinal organoids do not form optic cups (for details please see Zilova et al., 2021, eLIFE). We assume that the formation of cup-looking structure of the ocular organoids is mediated by the following processes: establishment of retina and lens domains at the specific region of the organoid – retina on the surface and lens in the center (see Figure S2 d and Figure 3e, and Figure 4). Further dislocation of the centrally formed lens towards the organoid periphery through the retina layer, places the lens to the periphery while retinal cells stay static. We assume that the “cup-like” shape is acquired by extrusion of the lens from the center of the organoid. To clarify this process with respect to tissue rearrangements and cell movements, we will include additional experiments (see Revision Plan, experiment 2) and follow lens- and retina-fated cells (by employing lens-specific Foxe3::GFP and retina-specific Rx2::H2B-RFP reporter lines) through the process of lens extrusion to dissect individual contribution of retinal/lens cells to this process (cross-reference with Reviewer #2).

      (3) The authors showed that blockade of FGF signaling affects lens fiber differentiation in day 1-2, whereas lens formation seems to be intact in the presence of FGF receptor inhibitor in day 0-1. I suggest the authors to examine which tissue is a target of FGF signaling in retinal organoids, using markers such as pea3, which is a downstream target of ERK branch of FGF signaling. Since FGF signaling promotes cell proliferation, is the lens core size normal in SU5402-treated organoids from day 0 to day 1?

      Assessing the activity of FGF signaling (cross-reference to Reviewer #3) in the organoids is indeed an important point. To address which tissue is the target of FGF signaling we will include additional experiments and assess the phosphorylation status of ERK (pERK) and expression of the ERK downstream target pea3, as suggested by the reviewer (see Revision Plan, experiment 3). That will allow to identify the tissue within the organoid responding to the Fgf signaling.

      Lens core size of organoids treated with SU5402 from day 0 to day 1 is fully comparable to the control (please see Figure 6b).

      (4) Fig. 3f and 3g indicate that there is some cell population located between foxe3:GFP+ cells and rx2:H2B-RFP+ cells. What kind of cell-type is occupied in the interface area between foxe3:GFP+ cells and rx2:H2B-RFP+ cells?

      That is for sure an interesting question. We are aware of this population of cells. We currently do not have data that would with certainty clarify the fate of those cells. We are currently following up on that question with the use of scRNA sequencing, however we will not be able to address this question in the current manuscript.

      (5) Fig. 5e indicates the depth of Rx3 expression at day 1. Is the depth the thickness of Rx3 expressing cell sheet, which covers the central lens core in the organoids? If so, I wonder if total cell number of Rx3 expressing cell sheet may be different in each seeded-cell number, because thickness is the same across each seeded-cell number, but the surface area size may be different depending on underneath the lens core size. Please clarify this point.

      Yes. Figure 5e indicates the thickness of the cell sheet expressing Rx3 that lies on the surface of the organoid. Indeed, the number of Rx3-expressing cells (and lens cells) scales with the size of the organoid as stated in the submitted manuscript.

      (6) Noggin application inhibits lens formation at day 0-1. BMP signaling regulates formation of lens placode and olfactory placode at the early stage of development. It is interesting to examine whether Noggin-treated organoid expands olfactory placode area. Please check forebrain territory markers.

      What tissue differentiates at the expense of the lens in BMP inhibitor-treated organoids is of course an intriguing question. To address the identity of cells differentiated under this condition we will include an additional experiment (see Revision Plan, experiment 4 as suggested by the reviewer). We will check for the expression of Lhx2, Otx2 and Huc/D to address this point.

      I have no minor comments

      Referees cross-commenting

      I agree that all reviewers have similar suggestions, which are reasonable and provided the same estimated time for revision.

      Reviewer #1 (Significance):

      Strength:

      This study is unique. The authors examined eye cup morphogenesis using fish retinal organoids. Eye cup normally consists of the lens, the neural retina, pigment epithelium and optic stalk. However, retinal organoids seem to be simple and consists of two cell types, lens and retina. Interestingly, a similar optic cup-like structure is achieved in both cases; however, underlying mechanism is different. It is interesting to investigate how eye morphogenesis is regulated in retinal organoids,under the unconstrained embryo-free environment.

      Limitation:

      Description is OK, but analysis is not much profound. It is necessary to apply a bit more molecular and cellular level analysis, such as tracking of cell movement and visualization of FGF signnaling in organoid tissues.

      Advancement:

      The current study is descriptive. Need some conceptual advance, which impact cell biology field or medical science.

      Audience:

      The target audience of current study are still within ophthalmology and neuroscience community people, maybe translational/clinical rather than basic biology. To beyond specific fields, need to formulate a general principle for cell and developmental biology.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this study from Stahl et al., the authors demonstrate that medaka pluripotent embryonic cells can self-organise into eye organoids containing both retina and lens tissues. While these organoids can self-organize into an eye structure that resembles the vertebrate eye, they are built from a fundamentally different morphogenetic process – an “inside-out” mechanism where the lens forms centrally and moves outward, rather than the normal “outside-in” embryonic process. This is a very interesting discovery, both for our understanding of developmental biology and the potential for tissue engineering applications. The study would benefit from some additional experiments and a few clarifications.

      The authors suggest that the lens cells are the ones that move from the central to a more superficial position. Is this an active movement of lens cells or just the passive consequence of the retina cells acquiring a cup shape? Are the retina cells migrating behind the lens or the lens cells pushing outwards? High-resolution imaging of organoid cup formation, tracking retina cells in combination with membrane labeling of all cells would help elucidate the morphogenetic processes occurring in the organoids. Membrane labeling would also be useful as Prox1 positive lens cells appear elongated in embryos while in the organoids, cell shapes seem less organised, less compact and not elongated (for example as shown in Fig 3f,g).

      Looking into the detail of how optic cup-looking tissue arrangement of ocular organoids is achieved on cellular level is of course interesting. We assume that the formation of cup-looking structures of the ocular organoids is mediated by following processes: establishment of retina and lens domains at a specific region of the organoid – retina on the surface and lens in the center (see Figure S2 d and Figure 3e, and Figure 4). Further dislocation of centrally formed lenses towards the organoid periphery through the retina layer, place the lens to the periphery while retinal cells stay static. We assume that the “cup-like” shape is acquired by extrusion of the lens. To clarify this process with respect to tissue rearrangements and cell movements, we will include additional experiments (see Revision Plan, experiment 2). We will follow lens- and retina-fated cells (by employing lens-specific Foxe3::GFP and retina-specific Rx2::H2B-RFP reporter lines) through the process of lens extrusion to dissect the individual contribution of retinal/lens cells to this process (cross-reference with Reviewer #1).

      The organoids could be a useful tool to address how cell fate is linked to cell shape acquisition. In the forming organoids, retinal tissue initially forms on the outside, while non-retinal tissue is located in the centre; this central tissue later expresses lens markers. Do the authors have any insights into why fate acquisition occurs in this pattern? Is there a difference in proliferation rates between the centrally located cells and the external ones? Could it be that highly proliferative cells give rise to neural retina (NR), while lower proliferating cells become lens?

      The question how is the retinal and lens domain established in this specific manner is indeed intriguing and very interesting. We dedicated a part of the discussion to this topic. We discuss the role of the diffusion limit and the potential contribution of BMB and FGF signaling to this arrangement. Additional experiments (see Revision Plan, experiment 3) addressing the source and target tissues of FGF and BMP signaling in the organoid will ultimately bring more clarity to our understanding of the tissue arrangements in the organoid. 

      Although analysis of the proliferation rate of the cells at the surface and in the central region of the organoid might possibly show some differences in the proliferation rates between lens and retinal cells, we do not have any indications, that the proliferation rate itself would be instructive or superior to the cell fate decisions.

      What happens in organoids that do not form lenses? Do these organoids still generate foxe3 positive cells that fail to develop into a proper lens structure? And in the absence of lens formation, does the retina still acquire a cup shape?

      Lens formation is primarily dependent on acquisition/specification of Foxe3-expressing lens placode progenitors. If those are not present, a lens does not develop. Once Foxe3-expressing progenitors are established, a lens is formed in unperturbed conditions (measured by the presence of expression of crystallin proteins). In such conditions, organoids that do not have a lens, do not carry Foxe3-expressing cells.

      In the absence of the lens, the organoid is composed of retinal neuroepithelium, that does not form an optic cup (for details of such phenotypes please see Zilova et al., 2021, eLIFE).

      The author suggest that lens formation occurs even in the absence of Matrigel. Is the process slower in these conditions? Are the resulting organoids smaller? While there are indeed some LFC expressing cells by day2, these cells are not very well organised and the pattern of expression seems dotty. Moreover, LFC staining seems to localise posterior to the LFC negative, lens-like structure (e.g. Fig.S1 3o’clock).

      How do these organoids develop beyond day 4? Do they maintain their structural integrity at later stages?

      The role of HEPES in promoting organoid formation is intriguing. Do the authors have any insights into why it is important in this context? Have the authors tried other culture conditions and does culture condition influence the morphogenetic pathways occurring within the organoids?

      We thank the reviewer for pointing this out. We were not clear in the wording and describing of our observation. Indeed, Matrigel is not required for acquisition of lens fate, which can be demonstrated with the expression of lens-specific markers. However, the presence of Matrigel has a profound impact on the structural aspects of organoid formation. Matrigel is essential for organization of retinal-committed cells into the retinal epithelium (Zilova et al., 2021, eLIFE). The absence of the structure of the retinal epithelium can indeed negatively impact on the cellular organization and the overall lens structure. To clarify the contribution of the Matrigel to the speed of organoid lens development and to the overall structure of the organoid lens we will perform additional experiments (see Revision Plan, experiment 5). With the use of Foxe3::GFP reporter line we will measure the onset of the lens-specific gene expression. In addition, we will use the immunohistochemistry to assess the gross morphology and size of the organoids grown without the Matrigel (cross-reference with Reviewer #3).

      The role of the HEPES in lens formation is indeed very intriguing and currently under investigation. As HEPES is mainly used to regulate pH of the culture media and pH might have an impact on multiple cellular processes, it will require significant time investment to dissect molecular mechanism underlying the effect of HEPES on the process of lens formation (cross reference with Reviewer #3) and therefore cannot be addressed in the current manuscript.

      Referees cross-commenting

      Pleased to see that all the other reviewers are positive about the study and raise similar concerns and comments

      Reviewer #2 (Significance):

      This is a very interesting paper, and it will be important to determine whether this alternative morphogenetic process is specific to medaka or if similar developmental routes can be recapitulated in organoid cultures from other vertebrate species.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      The manuscript by Stahl and colleagues reports an approach to generate ocular organoids composed of retinal and lens structures, derived from Medaka blastula cells. The authors present a comprehensive characterisation of the timeline followed by lens and retinal progenitors, showing these have distinct origins, and that they recapitulate the expression of differentiation markers found in vivo. Despite this molecular recapitulation, morphogenesis is strikingly different, with lens progenitors arising at the centre of the organoid, and subsequently translocating to the outside.

      Comments:

      - The manuscript presents a beautiful set of high quality images showing expression of lens differentiation markers over time in the organoids. The set of experiments is very robust, with high numbers of organoids analysed and reproducible data. The mechanism by which lens specification is promoted in these organoids is, however, poorly analysed, and the reader does not get a clear understanding of what is different in these experiments, as compared to previous attempts, to support lens differentiation. There is a mention to HEPES supplementation, but no further analysis is provided, and the fact that the process is independent of ECM contradicts, as the authors point out, previous reports. The manuscript would benefit from a more detailed analysis of the mechanisms that lead to lens differentiation in this setting.

      The role of the HEPES in lens formation is indeed very intriguing and under current investigation. As HEPES is mainly used to regulate pH of the culture media and pH might have an impact on multiple cellular processes it will require a significant time investment to dissect molecular mechanism underlying the effect of HEPES on the process of lens formation (cross reference with Reviewer #2) and therefore unfortunately cannot be addressed in the current manuscript.

      To clarify the contribution of the Matrigel to the organoid lens development we will perform additional experiments (see Revision Plan, experiment 5). With the use of Foxe3::GFP reporter line we will measure the onset of the lens-specific gene expression. In addition, we will use the immunohistochemistry to assess the gross morphology and size of the organoids grown without the Matrigel (cross-reference with Reviewer #2).

      - The markers analysed to show onset of lens differentiation in the organoids seem to start being expressed, in vivo, when the lens placode starts invaginating. An analysis of earlier stages is not presented. This would be very informative, allowing to determine whether progenitors differentiate as placode and neuroepithelium first, to subsequently continue differentiating into lens and retina, respectively. Could early placodal and anterior neural plate markers be analysed in the organoids? This would provide a more complete sequence of lens vs retina differentiation in this model.

      Yes. The figures show the expression of lens and retinal markers in the embryo in later developmental stages and the timing of their expression can be documented with higher temporal resolution. In the revised version of the manuscript, we will provide the information about the onset of expression of Rx3::H2B-GFP (retina) and Foxe3::GFP (lens) (see Author response image 1). Rx3 represents one of the earlies markers labeling the presumptive eye field within the region of the anterior neural plate (S16, late gastrula). FoxE3::GFP expression can be detected within the head surface ectoderm before the lens placode is formed showing that Foxe3 is a suitable marker of placodal progenitors in medaka.

      We are convinced that the onset of Rx3 and Foxe3-driven reporters is early enough to make the claim about the separate origin of the lens (placodal) and retinal (anterior neuroectoderm) tissues within the ocular organoids.

      Author response image 1.

      - The analysis of BMP and Fgf requirement for lens formation and differentiation is suggestive, but the source of these signals is not resolved or mentioned in the manuscript. Are BMP4 and Fgf8 expressed by the organoids? Where are they coming from?

      Indeed, addressing the source of BMP and FGF activation would bring more clarity in understanding the mechanism of retina/lens specification within the ocular organoids (cross reference with Reviewer #1). To address this point, we will include additional experiments (see Revision Plan, experiment 3). We will analyze the expression of respective ligands (Bmp4 and Fgf8) and activation of downstream effectors of BMP and FGF signaling pathways within the ocular organoids as suggested by Reviewer #1 and Reviewer #3.

      - The fact that the lens becomes specified in the centre of the organoid is striking, but it is for me difficult to visualise how it ends up being extruded from the organoid. Did the authors try to follow this process in movies? I understand that this may be technically challenging, but it would certainly help to understand the process that leads to the final organisation of retinal and lens tissues in the organoid. There is no discussion of why the morphogenetic mechanism is so different from the in vivo situation. The manuscript would benefit from explicitly discussing this.

      Following the extruding lens in vivo is indeed very relevant suggestion. To clarify the process of ocular organoid formation in the respect of tissue rearrangements and cell movements, we will include additional experiment (see Revision Plan, experiment 2). We will follow lens- and retina-fated cells (by employing lens-specific Foxe3::GFP and retina-specific Rx2::H2B-RFP reporter lines) through the process of lens extrusion (cross-reference with Reviewer #1 and Reviewer #2).

      Referees cross-commenting

      We all seem to have similar comments and concerns. I think overall the suggestions are feasible and realistic for the timeframe provided.

      Reviewer #3 (Significance):

      This study describes a reproducible approach to differentiate ocular organoids composed of lens and retinal tissues. The characterisation of lens differentiation in this model is very detailed, and despite the morphogenetic differences, the molecular mechanisms show many similarities to the in vivo situation. The manuscript however does not highlight, in my opinion, why this model may be relevant. Clearly articulating this relevance, particularly in the discussion, will enhance the study and provide more clarity to the readers regarding the significance of the study for the field of organoid research, ocular research and regenerative studies.

      Revision Plan:

      (1) To address whether differential adhesion properties of retinal and lens progenitors mediate cell sorting to establish retina and lens domains in the organoids (Reviewer #1, comment 1), we will perform dissociation of the organoids on day 1 and subsequential re-aggregation. This experiment will allow to follow cell type specific adhesion properties of lens and retinal progenitor cells. We will employ lens progenitor reporter line Foxe3::GFP and retina-specific Rx2::H2B-RFP to monitor cell type specific sorting with fluorescent microscopy.

      (2)   Multiple reviewers (Reviewer #1, Reviewer #2, Reviewer #3) asked for the presentation of detailed in vivo imaging experiment showing individual contributions of retina- and lens- fated cells to the resulting tissue organization withing the ocular organoid. We will perform in vivo live imaging experiment to follow the movements of individual lens (Foxe3::GFP) and retinal (Rx2::H2B-GFP) cells from day 1 to day 2 of organoid development to address this point.

      (3) Reviewer #1 and Reviewer #3 raised questions concerning the role of FGF and BMP signaling and sources of these signaling pathway activities in ocular organoid tissue arrangement. To address this point and bring more light into the molecular mechanisms regulating lens and retina tissue arrangement in the organoid, we will perform additional experiment. We will assess the expression of candidate FGF and BMP ligands (Fgf8, Bmp7 and Bmp4) and activation of downstream effectors (p-ERK, p-SMAD) and the direct transcriptional target of Fgf signaling (Pea3) in the developing organoids. This will allow the identification of the tissue producing the ligand on one site and tissue responding to the signaling on the other site and help out to narrow down the molecular mechanism controlling tissue arrangements in the organoid.

      (4) We will analyze the expression of forebrain territory markers in organoids treated with the BMP inhibitor to identify the identity of the tissue differentiating at the expense of lens under the BMP inhibition (suggested by Reviewer #1). We will label Noggin-treated organoids with the antibodies against Lhx2, Otx2 and HuC/D to address this point.

      (5) We will provide more comprehensive analysis of the organoids grown without the Matrigel and compare them to the organoids grown in the presence of the Matrigel (mentioned by Reviewer #2 and Reviewer #3). With the use of lens progenitor-specific Foxe3::GFP reporter line, we will measure the onset of the lens-specific gene expression. In addition, we will use the immunohistochemistry to assess the gross morphology and size of the organoids grown without the Matrigel.

      Description of analyses that authors prefer not to carry out

      Reviewer #1:

      (4) Fig. 3f and 3g indicate that there is some cell population located between foxe3:GFP+ cells and rx2:H2B-RFP+ cells. What kind of cell-type is occupied in the interface area between foxe3:GFP+ cells and rx2:H2B-RFP+ cells?

      That is for sure interesting question. We are aware of this population of cells. We currently do not have a data that would with certainty clarify the fate of those cells. We are currently following up on that question with the use of scRNA sequencing, however we will not be able to address this question in the current manuscript.

      Reviewer #2:

      The role of HEPES in promoting organoid formation is intriguing. Do the authors have any insights into why it is important in this context? Have the authors tried other culture conditions and does culture condition influence the morphogenetic pathways occurring within the organoids?

      The role of the HEPES in lens formation is indeed very intriguing and under current investigation. As HEPES is mainly used to regulate pH of the culture media and pH might have impact on multiple cellular processes it will require significant time investment to dissect molecular mechanism underlying the effect of the HEPES on the process of lens formation (cross reference with Reviewer #3) and cannot be addressed in the current manuscript.

      Is there a difference in proliferation rates between the centrally located cells and the external ones? Could it be that highly proliferative cells give rise to neural retina (NR), while lower proliferating cells become lens?

      Although analysis of the proliferation rate of the cells at the surface and in the central region of the organoid might possibly show some differences in the proliferation rates between lens and retinal cells, we do not have any indications, that the proliferation rate itself would be instructive or superior to the cell fate decisions.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper is an elegant, mostly observational work, detailing observations that polysome accumulation appears to drive nucleoid splitting and segregation. Overall I think this is an insightful work with solid observations.

      Thank you for your appreciation and positive comments. In our view, an appealing aspect of this proposed biophysical mechanism for nucleoid segregation is its self-organizing nature and its ability to intrinsically couple nucleoid segregation to biomass growth, regardless of nutrient conditions.

      Strengths:

      The strengths of this paper are the careful and rigorous observational work that leads to their hypothesis. They find the accumulation of polysomes correlates with nucleoid splitting, and that the nucleoid segregation occurring right after splitting correlates with polysome segregation. These correlations are also backed up by other observations:

      (1) Faster polysome accumulation and DNA segregation at faster growth rates.

      (2) Polysome distribution negatively correlating with DNA positioning near asymmetric nucleoids.

      (3) Polysomes form in regions inaccessible to similarly sized particles.

      These above points are observational, I have no comments on these observations leading to their hypothesis.

      Thank you!

      Weaknesses:

      It is hard to state weaknesses in any of the observational findings, and furthermore, their two tests of causality, while not being completely definitive, are likely the best one could do to examine this interesting phenomenon.

      It is indeed difficult to prove causality in a definitive manner when the proposed coupling mechanism between nucleoid segregation and gene expression is self-organizing, i.e., does not involve a dedicated regulatory molecule (e.g., a protein, RNA, metabolite) that we could have depleted through genetic engineering to establish causality. We are grateful to the reviewer for recognizing that our two causality tests are the best that can be done in this context.

      Points to consider / address:

      Notably, demonstrating causality here is very difficult (given the coupling between transcription, growth, and many other processes) but an important part of the paper. They do two experiments toward demonstrating causality that help bolster - but not prove - their hypothesis. These experiments have minor caveats, my first two points.

      (1) First, "Blocking transcription (with rifampicin) should instantly reduce the rate of polysome production to zero, causing an immediate arrest of nucleoid segregation". Here they show that adding rifampicin does indeed lead to polysome loss and an immediate halting of segregation - data that does fit their model. This is not definitive proof of causation, as rifampicin also (a) stops cell growth, and (b) stops the translation of secreted proteins. Neither of these two possibilities is ruled out fully.

      That’s correct; cell growth also stops when gene expression is inhibited, which is consistent with our model in which gene expression within the nucleoid promotes nucleoid segregation and biomass growth (i.e., cell growth), inherently coupling these two processes. This said, we understand the reviewer’s point: the rifampicin experiment doesn’t exclude the possibility that protein secretion and cell growth drive nucleoid segregation. We are assuming that the reviewer is envisioning an alternative model in which sister nucleoids would move apart because they would be attached to the membrane through coupled transcription-translation-protein secretion (transertion) and the membrane would expand between the separating nucleoids, similar to the model proposed by Jacob et al in 1963 (doi:10.1101/SQB.1963.028.01.048). There are several observations arguing against this cell elongation/transertion model.

      (1) For this alternative mechanism to work, membrane growth must be localized at the middle of the splitting nucleoids (i.e., midcell position for slow growth and ¼ and ¾ cell positions for fast growth) to create a directional motion. To our knowledge, there is no evidence of such localized membrane incorporation. Furthermore, even if membrane growth was localized at the right places, the fluidity of the cytoplasmic membrane (PMID: 6996724, 20159151, 24735432, 27705775) would be problematic. To circumvent the membrane fluidity issue, one could potentially evoke an additional connection to the rigid peptidoglycan, but then again, peptidoglycan growth would have to be localized at the middle of the splitting nucleoid. However, peptidoglycan growth is dispersed early in the cell division cycle when the nucleoid splitting happens in fast growing cells and only appears to be zonal after the onset of cell constriction (PMID: 35705811, 36097171, 2656655).

      (2) Even if we ignore the aforementioned caveats, Paul Wiggins’s group ruled out the cell elongation/transertion model by showing that the rate of cell elongation is slower than the rate of chromosome segregation (PMID: 23775792). In the revised manuscript, we wil clarify this point and provide confirmatory data showing that the cell elongation rate is indeed slower than the nucleoid segregation rate, indicating that it cannot be the main driver.

      (3) Furthermore, our correlation analysis comparing the rate of nucleoid segregation to the rate of either cell elongation or polysome accumulation argues that polysome accumulation plays a larger role than cell elongation in nucleoid segregation. These data were already shown in Figure 1H and Figure 1 – figure supplement 3 of the original manuscript but were not highlighted in this context. We will revise the text to clarify this point.

      (4) The asymmetries in nucleoid compaction that we described in our paper are predicted by our model. We do not see how they could be explained by cell growth or protein secretion.

      (5) We also show that polysome accumulation at ectopic sites (outside the nucleoid) results in correlated nucleoid dynamics, consistent with our proposed mechanism. These nucleoid dynamics cannot be explained by cell growth or protein secretion (transertion).

      (1a) As rifampicin also stops all translation, it also stops translational insertion of membrane proteins, which in many old models has been put forward as a possible driver of nucleoid segregation, and perhaps independent of growth. This should at last be mentioned in the discussion, or if there are past experiments that rule this out it would be great to note them.

      It is not clear to us how the attachment of the DNA to the cytoplasmic membrane could alone create a directional force to move the sister nucleoids. We agree that old models have proposed a role for cell elongation (providing the force) and transertion (providing the membrane tether).  Please see our response above for the evidence (from the literature and our work) against it. This was mentioned in the introduction and Results section, but we agree that this was not well explained. We will add experimental data and revise the text to clarify these points.

      (1b) They address at great length in the discussion the possibility that growth may play a role in nucleoid segregation. However, this is testable - by stopping surface growth with antibiotics. Cells should still accumulate polysomes for some time, it would be easy to see if nucleoids are still segregated, and to what extent, thereby possibly decoupling growth and polysome production. If successful, this or similar experiments would further validate their model.

      We reviewed the literature and could not find a drug that stops cell growth without stopping gene expression. Any drug that affects the membrane integrity or potential stops gene expression, which requires ATP.  However, our experiment in which we drive polysome accumulation at ectopic sites decouples polysome accumulation from cell growth. In this experiment, by redirecting most of chromosome gene expression to a single plasmid-encoded gene, we reduce the rate of cell growth but still create a large accumulation of polysomes at an ectopic location. This ectopic polysome accumulation is sufficient to affect nucleoid dynamics in a correlated fashion. In the revised manuscript, we will clarify this point and add model simulations to show that our experimental observations are predicted by our model.

      (2) In the second experiment, they express excess TagBFP2 to delocalize polysomes from midcell. Here they again see the anticorrelation of the nucleoid and the polysomes, and in some cells, it appears similar to normal (polysomes separating the nucleoid) whereas in others the nucleoid has not separated. The one concern about this data - and the differences between the "separated" and "non-separated" nuclei - is that the over-expression of TagBFP2 has a huge impact on growth, which may also have an indirect effect on DNA replication and termination in some of these cells. Could the authors demonstrate these cells contain 2 fully replicated DNA molecules that are able to segregate?

      We will perform the requested experiment.

      (3) What is not clearly stated and is needed in this paper is to explain how polysomes do (or could) "exert force" in this system to segregate the nucleoid: what a "compaction force" is by definition, and what mechanisms causes this to arise (what causes the "force") as the "compaction force" arises from new polysomes being added into the gaps between them caused by thermal motions.

      They state, "polysomes exert an effective force", and they note their model requires "steric effects (repulsion) between DNA and polysomes" for the polysomes to segregate, which makes sense. But this makes it unclear to the reader what is giving the force. As written, it is unclear if (a) these repulsions alone are making the force, or (b) is it the accumulation of new polysomes in the center by adding more "repulsive" material, the force causes the nucleoids to move. If polysomes are concentrated more between nucleoids, and the polysome concentration does not increase, the DNA will not be driven apart (as in the first case) However, in the second case (which seems to be their model), the addition of new material (new polysomes) into a sterically crowded space is not exerting force - it is filling in the gaps between the molecules in that region, space that needs to arise somehow (like via Brownian motion). In other words, if the polysome region is crowded with polysomes, space must be made between these polysomes for new polysomes to be inserted, and this space must be made by thermal (or ATP-driven) fluctuations of the molecules. Thus, if polysome accumulation drives the DNA segregation, it is not "exerting force", but rather the addition of new polysomes is iteratively rectifying gaps being made by Brownian motion.

      We apologize for the understandable confusion. In our picture, the polysomes and DNA (conceptually considered as small plectonemic segments) basically behave as dissolved particles. If these particles were noninteracting, they would simply mix. However, both polysomes and DNA segments are large enough to interact sterically. So as density increases, steric avoidance implies a reduced conformational entropy and thus a higher free energy per particle. We argue (based on Miangolarra et al. PNAS 2021 PMID: 34675077 and Xiang et al. Cell 2021 PMID: 34186018) that the demixing of polysomes and DNA segments occurs because DNA segments pack better with each other than they do with polysomes. This raises the free energy cost associated with DNA-polysome interactions compared to DNA-DNA interactions.  We model this effect by introducing a term in the free energy χ_np, which refer to as a repulsion between DNA and polysomes, though as explained above it arises from entropic effects. At realistic cellular densities of DNA and polysomes this repulsive interaction is strong enough to cause the DNA and polysomes to phase separate.

      This same density-dependent free energy that causes phase separation can also give rise to forces, just in the way that a higher pressure on one side of a wall can give rise to a net force on the wall. Indeed, the “compaction force” we refer to is fundamentally an osmotic pressure difference. At some stages during nucleoid segregation, the region of the cell between nucleoids has a higher polysome concentration, and therefore a higher osmotic pressure, than the regions near the poles. This results in a net poleward force on the sister nucleoids that drives their migration toward the poles. This migration continues until the osmotic pressure equilibrates. Therefore, both phase separation (due to the steric repulsion described above) and nonequilibrium polysome production and degradation (which creates the initial accumulation of polysomes around midcell) are essential ingredients for nucleoid segregation.

      This will be clarified in the revised text, with the support of additional simulation results.

      The authors use polysome accumulation and phase separation to describe what is driving nucleoid segregation. Both terms are accurate, but it might help the less physically inclined reader to have one term, or have what each of these means explicitly defined at the start. I say this most especially in terms of "phase separation", as the currently huge momentum toward liquid-liquid interactions in biology causes the phrase "phase separation" to often evoke a number of wider (and less defined) phenomena and ideas that may not apply here. Thus, a simple clear definition at the start might help some readers.

      Phase separation means that the DNA-polysome steric repulsion is strong enough to drive their demixing, which creates a compact nucleoid. As mentioned in a previous point, this effect is captured in the free energy by the χ_np term, which is an effective repulsion between DNA and polysomes, though as explained above it arises from entropic effects.

      In the revised manuscript, we will illustrate this with our theoretical model by initializing a cell with a diffuse nucleoid and low polysome concentration. For the sake of simplicity, we assume that the cell does not elongate. We observe that the DNA-polysome steric repulsion is sufficient to compact the nucleoid and place it at mid-cell.

      (4) Line 478. "Altogether, these results support the notion that ectopic polysome accumulation drives nucleoid dynamics". Is this right? Should it not read "results support the notion that ectopic polysome accumulation inhibits/redirects nucleoid dynamics"?

      We think that this is correct; the ectopic polysome accumulation drives nucleoid dynamics. In our theoretical model, we can introduce polysome production at fixed sources to mimic the experiments where ectopic polysome production is achieved by high plasmid expression (Fig. 6). The model is able to recapitulate the two main phenotypes observed in experiments. These new simulation results will be added to the revised manuscript.

      (5) It would be helpful to clarify what happens as the RplA-GFP signal decreases at midcell in Figure 1- is the signal then increasing in the less "dense" parts of the cell? That is, (a) are the polysomes at midcell redistributing throughout the cell? (b) is the total concentration of polysomes in the entire cell increasing over time?

      It is a redistribution—the RplA-GFP signal remains constant in concentration from cell birth to division (Figure 1 – Figure Supplement 1E). This will be clarified in the revised text.

      (6) Line 154. "Cell constriction contributed to the apparent depletion of ribosomal signal from the mid-cell region at the end of the cell division cycle (Figure 1B-C and Movie S1)" - It would be helpful if when cell constriction began and ended was indicated in Figures 1B and C.

      Good idea. We will add markers to indicate the start of cell constriction. We will also indicate that cell birth and division correspond to the first and last images/timepoint in Fig. 1B and C, respectively.

      (7) In Figure 7 they demonstrate that radial confinement is needed for longitudinal nucleoid segregation. It should be noted (and cited) that past experiments of Bacillus l-forms in microfluidic channels showed a clear requirement role for rod shape (and a given width) in the positing and the spacing of the nucleoids.

      Wu et al, Nature Communications, 2020 . "Geometric principles underlying the proliferation of a model cell system" https://dx.doi.org/10.1038/s41467-020-17988-7

      Good point. We will add this reference. Thank you.

      (8) "The correlated variability in polysome and nucleoid patterning across cells suggests that the size of the polysome-depleted spaces helps determine where the chromosomal DNA is most concentrated along the cell length. This patterning is likely reinforced through the displacement of the polysomes away from the DNA dense region"

      It should be noted this likely functions not just in one direction (polysomes dictating DNA location), but also in the reverse - as the footprint of compacted DNA should also exclude (and thus affect) the location of polysomes

      We agree that the effects could go both ways at this early stage of the story. We will revise the text accordingly.  

      (9) Line 159. Rifampicin is a transcription inhibitor that causes polysome depletion over time. This indicates that all ribosomal enrichments consist of polysomes and therefore will be referred to as polysome accumulations hereafter". Here and throughout this paper they use the term polysome, but cells also have monosomes (and 2 somes, etc). Rifampicin stops the assembly of all of these, and thus the loss of localization could occur from both. Thus, is it accurate to state that all transcription events occur in polysomes? Or are they grouping all of the n-somes into one group?

      In the discussion, we noted that our term “polysomes” also includes monosomes for simplicity, but we agree that the term should have been defined much earlier. This will be done in the revised manuscript.

      Thank you for the valuable comments and suggestions!

      Reviewer #2 (Public review):

      Summary:

      The authors perform a remarkably comprehensive, rigorous, and extensive investigation into the spatiotemporal dynamics between ribosomal accumulation, nucleoid segregation, and cell division. Using detailed experimental characterization and rigorous physical models, they offer a compelling argument that nucleoid segregation rates are determined at least in part by the accumulation of ribosomes in the center of the cell, exerting a steric force to drive nucleoid segregation prior to cell division. This evolutionarily ingenious mechanism means cells can rely on ribosomal biogenesis as the sole determinant for the growth rate and cell division rate, avoiding the need for two separate 'sensors,' which would require careful coupling.

      Terrific summary! Thank you for your positive assessment.

      Strengths:

      In terms of strengths; the paper is very well written, the data are of extremely high quality, and the work is of fundamental importance to the field of cell growth and division. This is an important and innovative discovery enabled through a combination of rigorous experimental work and innovative conceptual, statistical, and physical modeling.

      Thank you!

      Weaknesses:

      In terms of weaknesses, I have three specific thoughts.

      Firstly, my biggest question (and this may or may not be a bona fide weakness) is how unambiguously the authors can be sure their ribosomal labeling is reporting on polysomes, specifically. My reading of the work is that the loss of spatial density upon rifampicin treatment is used to infer that spatial density corresponds to polysomes, yet this feels like a relatively indirect way to get at this question, given rifampicin targets RNA polymerase and not translation. It would be good if a more direct way to confirm polysome dependence were possible.

      The heterogeneity of ribosome distribution inside E. coli cells has been attributed to polysomes by many labs (PMID: 25056965, 38678067, 22624875, 31150626, 34186018, 10675340).  The attribution is also consistent with single-molecule tracking experiments showing that slow-moving ribosomes (polysomes) are excluded by the nucleoid whereas fast-diffusing ribosomes (free ribosomal subunits) are distributed throughout the cytoplasm (PMID: 25056965, 22624875).

      Furthermore, inhibition of translation initiation with kasugamycin treatment, which decreases the pool of polysomes, results in a homogenization of ribosomes and expansion of the nucleoid (see Author response image 1). This further supports the rifampicin experiments. Given that the attribution of ribosome heterogeneity to polysomes is well accepted in the field, we would prefer to not include these kasugamycin data in the revised manuscript because long-term exposure to this drug leads to nucleoid re-compaction (PMID: 25250841 and PMID: 34186018). This secondary effect may possibly be due to a dysregulated increase in synthesis of naked rRNAs (PMID: 14460744, PMID: 2114400, and PMID: 2448483) or ribosome aggregation, which we are currently investigating.

      Author response image 1.

      Effects of kasugamycin treatment on the intracellular distribution of ribosomes and nucleoids. Representative single cell (CJW7323) growing in M9gluCAAT.  Kasugamycin (3 mg/mL) was added at time = 0 min. Show is the early response (0-30 min) to the drug characterized by the homogenization of the ribosomal RplA-GFP fluorescence and the expansion of the HupA-mCherry-labeled nucleoids. For each segmented cell, the RplA-GFP and HupA-mCherry signals were normalized by the average fluorescence.

      Second, the authors invoke a phase separation model to explain the data, yet it is unclear whether there is any particular evidence supporting such a model, whether they can exclude simpler models of entanglement/local diffusion (and/or perhaps this is what is meant by phase separation?) and it's not clear if claiming phase separation offers any additional insight/predictive power/utility. I am OK with this being proposed as a hypothesis/idea/working model, and I agree the model is consistent with the data, BUT I also feel other models are consistent with the data. I also very much do not think that this specific aspect of the paper has any bearing on the paper's impact and importance.

      We appreciate the reviewer’s comment, but the output of our reaction-diffusion model is a bona fide phase separation (spinodal decomposition). So, we feel that we need to use the term when reporting the modeling results. Inside the cell, the situation is more complicated. As the reviewer points out, there likely are entanglements (not considered in our model) and other important factors (please see our discussion on the model limitations). This said, we will revise our text to clarify our terms and proposed mechanism.

      Finally, the writing and the figures are of extremely high quality, but the sheer volume of data here is potentially overwhelming. I wonder if there is any way for the authors to consider stripping down the text/figures to streamline things a bit? I also think it would be useful to include visually consistent schematics of the question/hypothesis/idea each of the figures is addressing to help keep readers on the same page as to what is going on in each figure. Again, there was no figure or section I felt was particularly unclear, but the sheer volume of text/data made reading this quite the mental endurance sport! I am completely guilty of this myself, so I don't think I have any super strong suggestions for how to fix this, but just something to consider.

      We agree that there is a lot to digest. We will add schematics and a didactic simulation. We will also try to streamline the text.

      Reviewer #3 (Public review):

      Summary:

      Papagiannakis et al. present a detailed study exploring the relationship between DNA/polysome phase separation and nucleoid segregation in Escherichia coli. Using a combination of experiments and modelling, the authors aim to link physical principles with biological processes to better understand nucleoid organisation and segregation during cell growth.

      Strengths:

      The authors have conducted a large number of experiments under different growth conditions and physiological perturbations (using antibiotics) to analyse the biophysical factors underlying the spatial organisation of nucleoids within growing E. coli cells. A simple model of ribosome-nucleoid segregation has been developed to explain the observations.

      Weaknesses:

      While the study addresses an important topic, several aspects of the modelling, assumptions, and claims warrant further consideration.

      Thank you for your feedback. Please see below for a response to each concern. 

      Major Concerns:

      Oversimplification of Modelling Assumptions:

      The model simplifies nucleoid organisation by focusing on the axial (long-axis) dimension of the cell while neglecting the radial dimension (cell width). While this approach simplifies the model, it fails to explain key experimental observations, such as:

      (1) Inconsistencies with Experimental Evidence:

      The simplified model presented in this study predicts that translation-inhibiting drugs like chloramphenicol would maintain separated nucleoids due to increased polysome fractions. However, experimental evidence shows the opposite-separated nucleoids condense into a single lobe post-treatment (Bakshi et al 2014), indicating limitations in the model's assumptions/predictions. For the nucleoids to coalesce into a single lobe, polysomes must cross the nucleoid zones via the radial shells around the nucleoid lobes.

      We do not think that the results from chloramphenicol-treated cells are inconsistent with our model. Our proposed mechanism predicts that nucleoids will condense in the presence of chloramphenicol, consistent with experiments. It also predicts that nucleoids that were still relatively close at the time of chloramphenicol treatment could fuse if they eventually touched through diffusion (thermal fluctuation) to reduce their interaction with the polysomes and minimize their conformational energy. Fusion is, however, not expected for well-separated nucleoids since their diffusion is slow in the crowded cytoplasm. This is consistent with our experimental observations: In the presence of a growth-inhibitory concentration of chloramphenicol (70 μg/mL), nucleoids in relatively close proximity can fuse, but well-separated nucleoids condense and do not fuse. Since the growth rate inhibition is not immediate upon chloramphenicol treatment, many cells with well-separated condensed nucleoids divide during the first hour. As a result, the non-fusion phenotype is more obvious in non-dividing cells, achieved by pre-treating cells with the cell division inhibitor cephalexin (50μg/mL). In these polyploid elongated cells, well-separated nucleoids condensed but did not fuse, not even after an hour in the presence of chloramphenicol (as illustrated in Author response image 2).

      In Bakshi et al, 2014, nucleoid fusion was shown for a single cell in which the sister nucleoids were relatively close to each other at the time of chloramphenicol treatment. Population statistics were provided for the relative length and width of the nucleoids, but not for the fusion events. So, it is unclear whether the illustrated fusion was universal or not. Also, we note that Bakshi et al (2014) used a chloramphenicol concentration of 300 μg/mL, which is 20-fold higher than the minimal inhibitory concentration for growth, compared to 70 μg/mL in our experiments.

      Author response image 2.

      Effects of chloramphenicol treatment on the intracellular distribution of ribosomes and nucleoids in non-dividing cells. Exponentially growing cells (M9glyCAAT at 30°C) were pre-treated with cephalexin for one hour before being spotted on an 1% agarose pad for time-lapse imaging. The agarose pad contained M9glyCAAT, cephalexin, and chloramphenicol.  (A) Phase contrast, RplA-GFP fluorescence and HupA-mCherry fluorescence images of a representative single cell. Three timepoints are shown, including the first image after spotting on the agarose pad (at 0 min), 30 minutes and one hour of chloramphenicol treatment. (B) One-dimensional profiles of the ribosomal (RplA-GFP) and nucleoid (HupA-mCherry) fluorescence from the cells shown in panel A. These intensity profiles correspond to the average fluorescence along the medial axis of the cell considering a 6-pixel region (0.4 μm) centered on the central line of the cell. The fluorescence intensity is plotted along the relative cell length, scaled from 0 to 100% between the two poles, illustrating the relative nucleoid length (L<sub>DNA</sub>/L<sub>cell</sub>) that was plotted by Bakshi et al in 2014 (PMID: 25250841).

      (2) The peripheral localisation of nucleoids observed after A22 treatment in this study and others (e.g., Japaridze et al., 2020; Wu et al., 2019), which conflicts with the model's assumptions and predictions. The assumption of radial confinement would predict nucleoids to fill up the volume or ribosomes to go near the cell wall, not the nucleoid, as seen in the data.

      The reviewer makes a good point that DNA attachment to the membrane through transertion likely contributes to the nucleoid being peripherally localized in A22 cells. We will revise the text to add this point. However, we do not think that this contradicts the proposed nucleoid segregation mechanism based on phase separation and out-of-equilibrium dynamics described in our model. On the contrary, by attaching the nucleoid to the cytoplasmic membrane along the cell width, transertion might help reduce the diffusion and thus exchange of polysomes across nucleoids. We will revise the text to discuss transertion over radial confinement.

      (3) The radial compaction of the nucleoid upon rifampicin or chloramphenicol treatment, as reported by Bakshi et al. (2014) and Spahn et al. (2023), also contradicts the model's predictions. This is not expected if the nucleoid is already radially confined.

      We originally evoked radial confinement to explain the observation that polysome accumulations do not equilibrate between DNA-free regions. We agree that transertion is an alternative explanation. Thank you for bringing it to our attention. However, please note that this does not contradict the model. In our view, it actually supports the 1D model by providing a reasonable explanation for the slow exchange of polysomes across DNA-free regions. The attachment of the nucleoid to the membrane along the cell width may act as diffusion barrier. We will revise the text and the title of the manuscript accordingly.

      (4) Radial Distribution of Nucleoid and Ribosomal Shell:

      The study does not account for well-documented features such as the membrane attachment of chromosomes and the ribosomal shell surrounding the nucleoid, observed in super-resolution studies (Bakshi et al., 2012; Sanamrad et al., 2014). These features are critical for understanding nucleoid dynamics, particularly under conditions of transcription-translation coupling or drug-induced detachment. Work by Yongren et al. (2014) has also shown that the radial organisation of the nucleoid is highly sensitive to growth and the multifork nature of DNA replication in bacteria.

      We will discuss the membrane attachment. Please see the previous response.

      The omission of organisation in the radial dimension and the entropic effects it entails, such as ribosome localisation near the membrane and nucleoid centralisation in expanded cells, undermines the model's explanatory power and predictive ability. Some observations have been previously explained by the membrane attachment of nucleoids (a hypothesis proposed by Rabinovitch et al., 2003, and supported by experiments from Bakshi et al., 2014, and recent super-resolution measurements by Spahn et al.).

      We agree—we will add a discussion about membrane attachment in the radial dimension. See previous responses.

      Ignoring the radial dimension and membrane attachment of nucleoid (which might coordinate cell growth with nucleoid expansion and segregation) presents a simplistic but potentially misleading picture of the underlying factors.

      As mentioned above, we will discuss membrane attachment in the revised manuscript.

      This reviewer suggests that the authors consider an alternative mechanism, supported by strong experimental evidence, as a potential explanation for the observed phenomena:

      Nucleoids may transiently attach to the cell membrane, possibly through transertion, allowing for coordinated increases in nucleoid volume and length alongside cell growth and DNA replication. Polysomes likely occupy cellular spaces devoid of the nucleoid, contributing to nucleoid compaction due to mutual exclusion effects. After the nucleoids separate following ter separation, axial expansion of the cell membrane could lead to their spatial separation.

      This “membrane attachment/cell elongation” model is reminiscent to the hypothesis proposed by Jacob et al in 1963 (doi:10.1101/SQB.1963.028.01.048). There are several lines of evidence arguing against it as the major driver of nucleoid segregation:

      (Below is a slightly modified version of our response to a comment from Reviewer 1—see page 3)

      (1) For this alternative model to work, axial membrane expansion (i.e., cell elongation) would have to be localized at the middle of the splitting nucleoids (i.e., midcell position for slow growth and ¼ and ¾ cell positions for fast growth) to create a directional motion. To our knowledge, there is no evidence of such localized membrane incorporation.  Furthermore, even if membrane growth was localized at the right places, the fluidity of the cytoplasmic membrane (PMID: 6996724, 20159151, 24735432, 27705775) would be problematic. To go around this fluidity issue, one could potentially evoke a potential connection to the rigid peptidoglycan, but then again, peptidoglycan growth would have to be localized at the middle of the splitting nucleoid to “push” the sister nucleoid apart from each other. However, peptidoglycan growth is dispersed prior to cell constriction (PMID: 35705811, 36097171, 2656655).

      (2) Even if we ignore the aforementioned caveats, Paul Wiggins’s group ruled out the cell elongation/transertion model by showing that the rate of cell elongation is slower than the rate of chromosome segregation (PMID: 23775792). In the revised manuscript, we will provide additional data showing that the cell elongation rate is indeed slower than the nucleoid segregation rate.

      (3) Furthermore, our correlation analysis comparing the rate of nucleoid segregation to the rate of either cell elongation or polysome accumulation argues that polysome accumulation plays a larger role than cell elongation in nucleoid segregation. These data were already shown in the original manuscript (Figure 1I and Figure 1 – figure supplement 3) but were not highlighted in this context. We will revise the text to clarify this point.

      (4) The membrane attachment/cell elongation model does not explain the nucleoid asymmetries described in our paper (Figure 3), whereas they can be recapitulated by our model.

      (5) The cell elongation/transertion model cannot predict the aberrant nucleoid dynamics observed when chromosomal expression is largely redirected to plasmid expression. In the revised manuscript, we will add simulation results showing that these nucleoid dynamics are predicted by our model.

      In line of these arguments, we do not believe that a mechanism based on membrane attachment and cell elongation is the major driver of nucleoid segregations. However, we do believe that it may play a complementary role (see “Nucleoid segregation likely involves multiple factors” in the Discussion). We will revise this section to clarify our thoughts and mention the potential role of transertion.

      Incorporating this perspective into the discussion or future iterations of the model may provide a more comprehensive framework that aligns with the experimental observations in this study and previous work.

      As noted above, we will revise the text to mention about transertion.

      Simplification of Ribosome States:

      Combining monomeric and translating ribosomes into a single 'polysome' category may overlook spatial variations in these states, particularly during ribosome accumulation at the mid-cell. Without validating uniform mRNA distribution or conducting experimental controls such as FRAP or single-molecule measurements to estimate the proportions of ribosome states based on diffusion, this assumption remains speculative.

      Indeed, for simplicity, we adopt an average description of all polysomes with an average diffusion coefficient and interaction parameters, which is sufficient for capturing the fundamental mechanism underlying nucleoid segregation. To illustrate that considering multiple polysome species does not change the physical picture, we consider an extension of our model, which contains three polysome species, each with a different diffusion coefficient (D<SUB>P</SUB> = 0.018, 0.023, or 0.028 μm<sup>2</sup>/s), reflecting that polysomes with more ribosomes will have a lower diffusion coefficient. Simulation of this model reveals that the different polysome species have essentially the same concentration distribution, suggesting that the average description in our minimal model is sufficient for our purposes. We will present these new simulation results in the revised manuscript.

    1. Author response:

      eLife assessment

      This study provides valuable information on the mechanism of PepT2 through enhanced-sampling molecular dynamics, backed by cell-based assays, highlighting the importance of protonation of selected residues for the function of a proton-coupled oligopeptide transporter (hsPepT2). The molecular dynamics approaches are convincing, but with limitations that could be addressed in the manuscript, including lack of incorporation of a protonation coordinate in the free energy landscape, possibility of protonation of the substrate, errors with the chosen constant pH MD method for membrane proteins, dismissal of hysteresis emerging from the MEMENTO method, and the likelihood of other residues being affected by peptide binding. Some changes to the presentation could be considered, including a better description of pKa calculations and the inclusion of error bars in all PMFs. Overall, the findings will appeal to structural biologists, biochemists, and biophysicists studying membrane transporters.

      We would like to express our gratitude to the reviewers for providing their feedback on our manuscript, and also for recognising the variety of computational methods employed, the amount of sampling collected and the experimental validation undertaken. Following the individual reviewer comments, as addressed point-by-point below, we will shortly prepare a revised version of this paper. Intended changes to the revised manuscript are marked up in bold font in the detailed responses below, but before that we address some of the comments made above in the general assessment:

      • “lack of incorporation of a protonation coordinate in the free energy landscape”. We acknowledge that of course it would be highly desirable to treat protonation state changes explicitly and fully coupled to conformational changes. However, at this point in time, evaluating such a free energy landscape is not computationally feasible (especially considering that the non-reactive approach taken here already amounts to almost 1ms of total sampling time). Previous reports in the literature tend to focus on either simpler systems or a reduced subset of a larger problem. As we were trying to obtain information on the whole transport cycle, we decided to focus here on non-reactive methods.

      • “possibility of protonation of the substrate”. The reviewers are correct in pointing out this possibility, which we had not discussed explicitly in our manuscript. Briefly, while we describe a mechanism in which protonation of only protein residues (with an unprotonated ligand) can account for driving all the necessary conformational changes of the transport cycle, there is some evidence for a further intermediate protonation site in our data (as we commented on in the first version of the manuscript as well), which may or may not be the substrate itself. A future explicit treatment of the proton movements through the transporter, when it will become computationally tractable to do so, will have to include the substrate as a possible protonation site; for the present moment, we will amend our discussion to alert the reader to the possibility that the substrate could be an intermediate to proton transport. This has repercussions for our study of the E56 pKa value, where – if protons reside with a significant population at the substrate C-terminus – our calculated shift in pKa upon substrate binding could be an overestimate, although we would qualitatively expect the direction of shift to be unaffected. However, we also anticipate that treating this potential coupling explicitly would make convergence of any CpHMD calculation impractical to achieve and thus it may be the case that for now only a semi-quantitative conclusion is all that can be obtained.

      • “errors with the chosen constant pH MD method for membrane proteins”. We acknowledge that – as reviewer #1 has reminded us – the AMBER implementation of hybrid-solvent CpHMD is not rigorous for membrane proteins, and as such we will add a cautionary note to our paper. We will also explain how the use of the ABFE thermodynamic cycle calculations helps to validate the CpHMD results in a completely orthogonal manner (we will promote this validation which was in the supplementary figures into the main text in the revised version). We therefore remain reasonably confident in the results presented with regards to the reported pKa shift of E56 upon substrate binding, and suggest that if the impact of neglecting the membrane in the implicit-solvent stage of CpHMD is significant, then there is likely an error cancellation when considering shifts induced by the incoming substrate.

      • “dismissal of hysteresis emerging from the MEMENTO method”. We have shown in our method design paper how the use of the MEMENTO method drastically reduces hysteresis compared to steered MD and metadynamics for path generation, and find this improvement again for PepT2 in this study. We will address reviewer #3’s concern about our presentation on this point by revising our introduction of the MEMENTO method, as detailed in the response below.

      • “the likelihood of other residues being affected by peptide binding”. In this study, we have investigated in detail the involvement of several residues in proton-coupled di-peptide transport by PepT2. Short of the potential intermediate protonation site mentioned above, the set of residues we investigate form a minimal set of sorts within which the important driving forces of alternating access can be rationalised. We have not investigated in substantial detail here the residues involved in holding the peptide in the binding site, as they are well studied in the literature and ligand promiscuity is not the problem of interest here. It remains entirely possible that further processes contribute to the mechanism of driving conformational changes by involving other residues not considered in this paper. We will make our speculation that an ensemble of different processes may be contributing simultaneously more explicit in our revision, but do not believe any of our conclusions would be affected by this.

      As for the additional suggested changes in presentation, we will provide the requested details on the CpHMD analysis. Furthermore, we will use the convergence data presented separately in figures S12 and S16 to include error bars on our 1D-reprojections of the 2D-PMFs in figures 3, 4 and 5. (Note that we will opt to not do so in figures S10 and S15 which collate all 1D PMF reprojections for the OCC ↔ OF and OCC ↔ IF transitions in single reference plots, respectively, to avoid overcrowding those necessarily busy figures). We are also changing the colours schemes of these plots in our revision to improve accessibility.

      Reviewer #1 (Public Review):

      The authors have performed all-atom MD simulations to study the working mechanism of hsPepT2. It is widely accepted that conformational transitions of proton-coupled oligopeptide transporters (POTs) are linked with gating hydrogen bonds and salt bridges involving protonatable residues, whose protonation triggers gate openings. Through unbiased MD simulations, the authors identified extra-cellular (H87 and D342) and intra-cellular (E53 and E622) triggers. The authors then validated these triggers using free energy calculations (FECs) and assessed the engagement of the substrate (Ala-Phe dipeptide). The linkage of substrate release with the protonation of the ExxER motif (E53 and E56) was confirmed using constant-pH molecular dynamics (CpHMD) simulations and cellbased transport assays. An alternating-access mechanism was proposed. The study was largely conducted properly, and the paper was well-organized. However, I have a couple of concerns for the authors to consider addressing.

      We would like to note here that it may be slightly misleading to the reader to state that “The linkage of substrate release with the protonation of the ExxER motif (E53 and E56) was confirmed using constant-pH molecular dynamics (CpHMD) simulations and cell-based transport assays.” The cellbased transport assays confirmed the importance of the extracellular gating trigger residues H87, S321 and D342 (as mentioned in the preceding sentence), not of the substrate-protonation link as this line might be understood to suggest.

      (1) As a proton-coupled membrane protein, the conformational dynamics of hsPepT2 are closely coupled to protonation events of gating residues. Instead of using semi-reactive methods like CpHMD or reactive methods such as reactive MD, where the coupling is accounted for, the authors opted for extensive non-reactive regular MD simulations to explore this coupling. Note that I am not criticizing the choice of methods, and I think those regular MD simulations were well-designed and conducted. But I do have two concerns.

      a) Ideally, proton-coupled conformational transitions should be modelled using a free energy landscape with two or more reaction coordinates (or CVs), with one describing the protonation event and the other describing the conformational transitions. The minimum free energy path then illustrates the reaction progress, such as OCC/H87D342- → OCC/H87HD342H → OF/H87HD342H as displayed in Figure 3.

      We concur with the reviewer that the ideal way of describing the processes studied in our paper would be as a higher-dimensional free energy landscapes obtained from a simulation method that can explicitly model proton-transfer processes. Indeed, it would have been particularly interesting and potentially informative with regards to the movement of protons down into the transporter in the OF → OCC → IF sequence of transitions. As we note in our discussion on the H87→E56 proton transfer:

      “This could be investigated using reactive MD or QM/MM simulations (both approaches have been employed for other protonation steps of prokaryotic peptide transporters, see Parker et al. (2017) and Li et al. (2022)). However, the putative path is very long (≈ 1.7 nm between H87 and E56) and may or may not involve a large number of intermediate protonatable residues, in addition to binding site water. While such an investigation is possible in principle, it is beyond the scope of the present study.”

      Where even sampling the proton transfer step itself in an essentially static protein conformation would be pushing the boundaries of what has been achieved in the field, we believe that considering the current state-of-the-art, a fully coupled investigation of large-scale conformational changes and proton-transfer reaction is not yet feasible in a realistic/practical time frame. We also note this limitation already when we say that:

      “The question of whether proton binding happens in OCC or OF warrants further investigation, and indeed the co-existence of several mechanisms may be plausible here”.

      Nonetheless, we are actively exploring approaches to treat uptake and movement of protons explicitly for future work.

      In our revision, we will expand on our discussion of the reasoning behind employing a nonreactive approach and the limitations that imposes on what questions can be answered in this study.

      Without including the protonation as a CV, the authors tried to model the free energy changes from multiple FECs using different charge states of H87 and D342. This is a practical workaround, and the conclusion drawn (the OCC→ OF transition is downhill with protonated H87 and D342) seems valid. However, I don't think the OF states with different charge states (OF/H87D342-, OF/H87HD342-, OF/H87D342H, and OF/H87HD342H) are equally stable, as plotted in Figure 3b. The concern extends to other cases like Figures 4b, S7, S10, S12, S15, and S16. While it may be appropriate to match all four OF states in the free energy plot for comparison purposes, the authors should clarify this to ensure readers are not misled.

      The reviewer is correct in their assessment that the aligning of PMFs in these figures is arbitrary; no relative free energies of the PMFs to each other can be estimated without explicit free energy calculations at least of protonation events at the end state basins. The PMFs in our figures are merely superimposed for illustrating the differences in shape between the obtained profiles in each condition, as discussed in the text, and we will make this clear in the appropriate figure captions in our revision.

      b) Regarding the substrate impact, it appears that the authors assumed fixed protonation states. I am afraid this is not necessarily the case. Variations in PepT2 stoichiometry suggest that substrates likely participate in proton transport, like the Phe-Ala (2:1) and Phe-Gln (1:1) dipeptides mentioned in the introduction. And it is not rigorous to assume that the N- and C-termini of a peptide do not protonate/deprotonate when transported. I think the authors should explicitly state that the current work and the proposed mechanism (Figure 8) are based on the assumption that the substrates do not uptake/release proton(s).

      This is indeed an assumption inherent in the current work. While we do “speculate that the proton movement processes may happen as an ensemble of different mechanisms, and potentially occur contemporaneously with the conformational change” we do not in the current version indicate explicitly that this may involve the substrate. We will make clear the assumption and this possibility in the revised version of our paper. Indeed, as we discuss, there is some evidence in our PMFs of an additional protonation site not considered thus far, which may or may not be the substrate. We will make note of this point in the revised manuscript.

      As for what information can be drawn from the given experimental stoichiometries, we note in our paper that “a 2:1 stoichiometry was reported for the neutral di-peptide D-Phe-L-Ala and 3:1 for anionic D-Phe-L-Glu. (Chen et al., 1999) Alternatively, Fei et al. (1999) have found 1:1 stoichiometries for either of D-Phe-L-Gln (neutral), D-Phe-L-Glu (anionic), and D-Phe-L-Lys (cationic).”

      We do not assume that it is our place to arbit among the apparent discrepancies in the experimental data here, although we believe that our assumed 2:1 stoichiometry is additionally “motivated also by our computational results that indicate distinct and additive roles played by two protons in the conformational cycle mechanism”.

      (2) I have more serious concerns about the CpHMD employed in the study.

      a) The CpHMD in AMBER is not rigorous for membrane simulations. The underlying generalized Born model fails to consider the membrane environment when updating charge states. In other words, the CpHMD places a membrane protein in a water environment to judge if changes in charge states are energetically favorable. While this might not be a big issue for peripheral residues of membrane proteins, it is likely unphysical for internal residues like the ExxER motif. As I recall, the developers have never used the method to study membrane proteins themselves. The only CpHMD variant suitable for membrane proteins is the membrane-enabled hybrid-solvent CpHMD in CHARMM. While I do not expect the authors to redo their CpHMD simulations, I do hope the authors recognize the limitations of their method.

      We will discuss the limitations of the AMBER CpHMD implementation in the revised version. However, despite that, we believe we have in fact provided sufficient grounds for our conclusion that substrate binding affects ExxER motif protonation in the following way:

      In addition to CpHMD simulations, we establish the same effect via ABFE calculations, where the substrate affinity is different at the E56 deprotonated vs protonated protein. This is currently figure S20, though in the revised version we will move this piece of validation into a new panel of figure 6 in the main text, since it becomes more important with the CpHMD membrane problem in mind. Since the ABFE calculations are conducted with an all-atom representation of the lipids and the thermodynamic cycle closes well, it would appear that if the chosen CpHMD method has a systematic error of significant magnitude for this particular membrane protein system, there may be the benefit of error cancellation. While the calculated absolute pKa values may not be reliable, the difference made by substrate binding appears to be so, as judged by the orthogonal ABFE technique.

      Although the reviewer does “not expect the authors to redo their CpHMD simulations”, we consider that it may be helpful to the reader to share in this response some results from trials using the continuous, all-atom constant pH implementation that has recently become available in GROMACS (Aho et al 2022, https://pubs.acs.org/doi/10.1021/acs.jctc.2c00516) and can be used rigorously with membrane proteins, given its all-atom lipid representation.

      Unfortunately, when trying to titrate E56 in this CpHMD implementation, we found few protonationstate transitions taking place, and the system often got stuck in protonation state–local conformation coupled minima (which need to interconvert through rearrangements of the salt bridge network involving slow side-chain dihedral rotations in E53, E56 and R57). Author response image 1 shows this for the apo OF state, Author response image 2 shows how noisy attempts at pKa estimation from this data turn out to be, necessitating the use of a hybrid-solvent method.

      Author response image 1.

      All-atom CpHMD simulations of apo-OF PepT2. Red indicates protonated E56, blue is deprotonated.

      Author response image 2.

      Difficulty in calculating the E56 pKa value from the noisy all-atom CpHMD data shown in Author response image 1

      b) It appears that the authors did not make the substrate (Ala-Phe dipeptide) protonatable in holosimulations. This oversight prevents a complete representation of ligand-induced protonation events, particularly given that the substrate ion pairs with hsPepT2 through its N- & C-termini. I believe it would be valuable for the authors to acknowledge this potential limitation.

      In this study, we implicitly assumed from the outset that the substrate does not get protonated, which – as by way of response to the comment above – we will acknowledge explicitly in revision. This potential limitation for the available mechanisms for proton transfer also applies to our investigation of the ExxER protonation states. In particular, a semi-grand canonical ensemble that takes into account the possibility of substrate C-terminus protonation may also sample states in which the substrate is protonated and oriented away from R57, thus leaving the ExxER salt bridge network in an apo-like state. The consequence would be that while the direction of shift in E56 pKa value will be the same, our CpHMD may overestimate its magnitude. It would thus be interesting to make the C-terminus protonatable for obtaining better quantitative estimates of the E56 pKa shift (as is indeed true in general for any other protein protonatable residue, though the effects are usually assumed to be negligible). We do note, however, that convergence of the CpHMD simulations would be much harder if the slow degree of freedom of substrate reorientation (which in our experience takes 10s to 100s of ns in this binding pocket) needs to be implicitly equilibrated upon protonation state transitions. We will discuss such considerations in the revision.

      Reviewer #2 (Public Review):

      This is an interesting manuscript that describes a series of molecular dynamics studies on the peptide transporter PepT2 (SLC15A2). They examine, in particular, the effect on the transport cycle of protonation of various charged amino acids within the protein. They then validate their conclusions by mutating two of the residues that they predict to be critical for transport in cell-based transport assays. The study suggests a series of protonation steps that are necessary for transport to occur in Petp2. Comparison with bacterial proteins from the same family shows that while the overall architecture of the proteins and likely mechanism are similar, the residues involved in the mechanism may differ.

      Strengths:

      This is an interesting and rigorous study that uses various state-of-the-art molecular dynamics techniques to dissect the transport cycle of PepT2 with nearly 1ms of sampling. It gives insight into the transport mechanism, investigating how the protonation of selected residues can alter the energetic barriers between various states of the transport cycle. The authors have, in general, been very careful in their interpretation of the data.

      Weaknesses:

      Interestingly, they suggest that there is an additional protonation event that may take place as the protein goes from occluded to inward-facing but they have not identified this residue.

      We have indeed suggested that there may be an additional protonation site involved in the conformational cycle that we have not been able to capture, which – as we discuss in our paper – might be indicated by the shapes of the OCC ↔ IF PMFs given in Figure S15. One possibility is for this to be the substrate itself (see the response to reviewer #1 above) though within the scope of this study the precise pathway by which protons move down the transporter and the exact ordering of conformational change and proton transfer reactions remains a (partially) open question. We acknowledge this and denote it with question marks in the mechanistic overview we give in Figure 8, and also “speculate that the proton movement processes may happen as an ensemble of different mechanisms, and potentially occur contemporaneously with the conformational change”.

      Some things are a little unclear. For instance, where does the state that they have defined as occluded sit on the diagram in Figure 1a? - is it truly the occluded state as shown on the diagram or does it tend to inward- or outward-facing?

      Figure 1a is a simple schematic overview intended to show which structures of PepT2 homologues are available to use in simulations. This was not meant to be a quantitative classification of states. Nonetheless, we can note that the OCC state we derived has extra- and intracellular gate opening distances (as measured by the simple CVs defined in the methods and illustrated in Figure 2a) that indicate full gate closure at both sides. In particular, although it was derived from the IF state via biased sampling, the intracellular gate opening distance in the OCC state used for our conformational change enhanced sampling was comparable to that of the OF state (ie, full closure of the gate), see Figure S2b and the grey bars therein. Therefore, we would schematically classify the OCC state to lie at the center of the diagram in Figure 1a. Furthermore, it is largely stable over triplicates of 1 μslong unbiased MD, where in 2/3 replicates the gates remain stable, and the remaining replicate there is partial opening of the intracellular gate (as shown in Figure 2 b/c under the “apo standard” condition). We comment on this in the main text by saying that “The intracellular gate, by contrast, is more flexible than the extracellular gate even in the apo, standard protonation state”, and link it to the lower barrier for transition to IF than to OF. We did this by saying that “As for the OCC↔OF transitions, these results explain the behaviour we had previously observed in the unbiased MD of Figure 2c.” We acknowledge this was not sufficiently clear and will add details to the latter sentence in revision to help clarify better the nature of the occluded state.

      The pKa calculations and their interpretation are a bit unclear. Firstly, it is unclear whether they are using all the data in the calculations of the histograms, or just selected data and if so on what basis was this selection done. Secondly, they dismiss the pKa calculations of E53 in the outward-facing form as not being affected by peptide binding but say that E56 is when there seems to be a similar change in profile in the histograms.

      In our manuscript, we have provided two distinct analyses of the raw CpHMD data. Firstly, we analysed the data by the replicates in which our simulations were conducted (Figure 6, shown as bar plots with mean from triplicates +/- standard deviation), where we found that only the effect on E56 protonation was distinct as lying beyond the combined error bars. This analysis uses the full amount of sampling conducted for each replicate. However, since we found that the range of pKa values estimated from 10ns/window chunks was larger than the error bars obtained from the replicate analysis (Figures S17 and S18), we sought to verify our conclusion by pooling all chunk estimates and plotting histograms (Figure S19). We recover from those the effect of substrate binding on the E56 protonation state on both the OF and OCC states. However, as the reviewer has pointed out (something we did not discuss in our original manuscript), there is a shift in the pKa of E53 of the OF state only. In fact, the trend is also apparent in the replicate-based analysis of Figure 6, though here the larger error bars overlap. In our revision, we will add more details of these analyses for clarity (including more detailed figure captions regarding the data used in Figure 6) as well as a discussion of the partial effect on the E53 pKa value.

      We do not believe, however, that our key conclusions are negatively affected. If anything, a further effect on the E53 pKa which we had not previously commented on (since we saw the evidence as weaker, pertaining to only one conformational state) would strengthen the case for an involvement of the ExxER motif in ligand coupling.

      Reviewer #3 (Public Review):

      Summary:

      Lichtinger et al. have used an extensive set of molecular dynamics (MD) simulations to study the conformational dynamics and transport cycle of an important member of the proton-coupled oligopeptide transporters (POTs), namely SLC15A2 or PepT2. This protein is one of the most wellstudied mammalian POT transporters that provides a good model with enough insight and structural information to be studied computationally using advanced enhanced sampling methods employed in this work. The authors have used microsecond-level MD simulations, constant-PH MD, and alchemical binding free energy calculations along with cell-based transport assay measurements; however, the most important part of this work is the use of enhanced sampling techniques to study the conformational dynamics of PepT2 under different conditions.

      The study attempts to identify links between conformational dynamics and chemical events such as proton binding, ligand-protein interactions, and intramolecular interactions. The ultimate goal is of course to understand the proton-coupled peptide and drug transport by PepT2 and homologous transporters in the solute carrier family.

      Some of the key results include:

      (1) Protonation of H87 and D342 initiate the occluded (Occ) to the outward-facing (OF) state transition.

      (2) In the OF state, through engaging R57, substrate entry increases the pKa value of E56 and thermodynamically facilitates the movement of protons further down.

      (3) E622 is not only essential for peptide recognition but also its protonation facilitates substrate release and contributes to the intracellular gate opening. In addition, cell-based transport assays show that mutation of residues such as H87 and D342 significantly decreases transport activity as expected from simulations.

      Strengths:

      (1) This is an extensive MD-based study of PepT2, which is beyond the typical MD studies both in terms of the sheer volume of simulations as well as the advanced methodology used. The authors have not limited themselves to one approach and have appropriately combined equilibrium MD with alchemical free energy calculations, constant-pH MD, and geometry-based free energy calculations. Each of these 4 methods provides a unique insight regarding the transport mechanism of PepT2.

      (2) The authors have not limited themselves to computational work and have performed experiments as well. The cell-based transport assays clearly establish the importance of the residues that have been identified as significant contributors to the transport mechanism using simulations.

      (3) The conclusions made based on the simulations are mostly convincing and provide useful information regarding the proton pathway and the role of important residues in proton binding, protein-ligand interaction, and conformational changes.

      Weaknesses:

      (1) Some of the statements made in the manuscript are not convincing and do not abide by the standards that are mostly followed in the manuscript. For instance, on page 4, it is stated that "the K64-D317 interaction is formed in only ≈ 70% of MD frames and therefore is unlikely to contribute much to extracellular gate stability." I do not agree that 70% is negligible. Particularly, Figure S3 does not include the time series so it is not clear whether the 30% of the time where the salt bridge is broken is in the beginning or the end of simulations. For instance, it is likely that the salt bridge is not initially present and then it forms very strongly. Of course, this is just one possible scenario but the point is that Figure S3 does not rule out the possibility of a significant role for the K64-D317 salt bridge.

      The reviewer is right to point out that the statement and Figure S3 as they stand do not adequately support our decision to exclude the K64-D317 salt-bridge in our further investigations. The violin plot shown in Figure S3, visualised as pooled data from unbiased 1 μs triplicates, does indeed not rule out a scenario where the salt bridge only formed late in our simulations (or only in some replicates), but then is stable. Therefore, in our revision, we will include the appropriate time-series of the salt bridge distances, showing how K64-D317 is initially stable but then falls apart in replicate 1, and is transiently formed and disengaged across the trajectories in replicates 2 and 3. We will also remake the data for this plot as we discovered a bug in the relevant analysis script that meant the D170-K642 distance was not calculated accurately. The results are however almost identical, and our conclusions remain.

      (2) Similarly, on page 4, it is stated that "whether by protonation or mutation - the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed (Figure S5)." I do not agree with this assessment. The authors need to be aware of the limitations of this approach. Consider "WT H87-prot" and "D342A H87-prot": when D342 residue is mutated, in one out of 3 simulations, we see the opening of the gate within 1 us. When D342 residue is not mutated we do not see the opening in any of the 3 simulations within 1 us. It is quite likely that if rather than 3 we have 10 simulations or rather than 1 us we have 10 us simulations, the 0/3 to 1/3 changes significantly. I do not find this argument and conclusion compelling at all.

      If the conclusions were based on that alone, then we would agree. However, this section of work covers merely the observations of the initial unbiased simulations which we go on to test/explore with enhanced sampling in the rest of the paper, and which then lead us to the eventual conclusions.

      Figure S5 shows the results from triplicate 1 μs-long trajectories as violin-plot histograms of the extracellular gate opening distance, also indicating the first and final frames of the trajectories as connected by an arrow for orientation – a format we chose for intuitively comparing 48 trajectories in one plot. The reviewer reads the plot correctly when they analyse the “WT H87-prot” vs “D342A H87-prot” conditions. In the former case, no spontaneous opening in unbiased MD is taking place, whereas when D342 is mutated to alanine in addition to H87 protonation, we see spontaneous transition in 1 out of 3 replicates. However, the reviewer does not seem to interpret the statement in question in our paper (“the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed”) in the way we intended it to be understood. We merely want to note here a correlation in the unbiased dataset we collected at this stage, and indeed the one spontaneous opening in the case comparison picked out by the reviewer is in the condition where both the H87 interaction network and D342-R206 are perturbed. In noting this we do not intend to make statistically significant statements from the limited dataset. Instead, we write that “these simulations show a large amount of stochasticity and drawing clean conclusions from the data is difficult”. We do however stand by our assessment that from this limited data we can “already appreciate a possible mechanism where protons move down the transporter pore” – a hypothesis we investigate more rigorously with enhanced sampling in the rest of the paper. We will revise the section in question to make clearer that the unbiased MD is only meant to give an initial hypothesis here to be investigated in more detail in the following sections. In doing so, we will also incorporate, as we had not done before, the case (not picked out by the reviewer here but concerning the same figure) of S321A & H87 prot. In the third replicate, this shows partial gate opening towards the end of the unbiased trajectory (despite D342 not being affected), highlighting further the stochastic nature that makes even clear correlative conclusions difficult to draw.

      (3) While the MEMENTO methodology is novel and interesting, the method is presented as flawless in the manuscript, which is not true at all. It is stated on Page 5 with regards to the path generated by MEMENTO that "These paths are then by definition non-hysteretic." I think this is too big of a claim to say the paths generated by MEMENTO are non-hysteretic by definition. This claim is not even mentioned in the original MEMENTO paper. What is mentioned is that linear interpolation generates a hysteresis-free path by definition. There are two important problems here: (a) MEMENTO uses the linear interpolation as an initial step but modifies the intermediates significantly later so they are no longer linearly interpolated structures and thus the path is no longer hysteresisfree; (b) a more serious problem is the attribution of by-definition hysteresis-free features to the linearly interpolated states. This is based on conflating the hysteresis-free and unique concepts. The hysteresis in MD-based enhanced sampling is related to the presence of barriers in orthogonal space. For instance, one may use a non-linear interpolation of any type and get a unique pathway, which could be substantially different from the one coming from the linear interpolation. None of these paths will be hysteresis-free necessarily once subjected to MD-based enhanced sampling techniques.

      We certainly do not intend to claim that the MEMENTO method is flawless. The concern the reviewer raises around the statement "These paths are then by definition non-hysteretic" is perhaps best addressed by a clarification of the language used and considering how MEMENTO is applied in this work.

      Hysteresis in the most general sense denotes the dependence of a system on its history, or – more specifically – the lagging behind of the system state with regards to some physical driver (for example the external field in magnetism, whence the term originates). In the context of biased MD and enhanced sampling, hysteresis commonly denotes the phenomenon where a path created by a biased dynamics method along a certain collective variable lags behind in phase space in slow orthogonal degrees of freedom (see Figure 1 in Lichtinger and Biggin 2023, https://doi.org/10.1021/acs.jctc.3c00140). When used to generate free energy profiles, this can manifest as starting state bias, where the conformational state that was used to seed the biased dynamics appears lower in free energy than alternative states. Figure S6 shows this effect on the PepT2 system for both steered MD (heavy atom RMSD CV) + umbrella sampling (tip CV) and metadynamics (tip CV). There is, in essence, a coupled problem: without an appropriate CV (which we did not have to start with here), path generation that is required for enhanced sampling displays hysteresis, but the refinement of CVs is only feasible when paths connecting the true phase space basins of the two conformations are available. MEMENTO helps solve this issue by reconstructing protein conformations along morphing paths which perform much better than steered MD paths with respect to giving consistent free energy profiles (see Figure S7 and the validation cases in the MEMENTO paper), even if the same CV is used in umbrella sampling.

      There are still differences between replicates in those PMFs, indicating slow conformational flexibility propagated from end-state sampling through MEMENTO. We use this to refine the CVs further with dimensionality reduction (see the Method section and Figure S8), before moving to 2D-umbrella sampling (figure 3). Here, we think, the reviewer’s point seems to bear. The MEMENTO paths are ‘non-hysteretic by definition’ with respect to given end states in the sense that they connect (by definition) the correct conformations at both end-states (unlike steered MD), which in enhanced sampling manifests as the absence of the strong starting-state bias we had previously observed (Figure S7 vs S6). They are not, however, hysteresis-free with regards to how representative of the end-state conformational flexibility the structures given to MEMENTO really were, which is where the iterative CV design and combination of several MEMENTO paths in 2D-PMFs comes in.

      We also cannot make a direct claim about whether in the transition region the MEMENTO paths might be separated from the true (lower free energy) transition paths by slow orthogonal degrees of freedom, which may conceivably result in overestimated barrier heights separating two free energy basins. We cannot guarantee that this is not the case, but neither in our MEMENTO validation examples nor in this work have we encountered any indications of a problem here.

      We hope that the reviewer will be satisfied by our revision, where we will replace the wording in question by a statement that the MEMENTO paths do not suffer from hysteresis that is otherwise incurred as a consequence of not reaching the correct target state in the biased run (in some orthogonal degrees of freedom).

    1. Author Response

      Response to the Reviews

      We are grateful for these balanced, nuanced evaluations of our work concerning the observed epistatic trends and our interpretations of their mechanistic origins. Overall, we think the reviewers have done an excellent job at recognizing the novel aspects of our findings while also discussing the caveats associated with our interpretations of the biophysical effects of these mutations. We believe it is important to consider both of these aspects of our work in order to appreciate these advances and what sorts of pertinent questions remain.

      Notably, both reviewers suggest that a lack of experimental approaches to compare the conformational properties of GnRHR variants weakens our claims. We would first humbly suggest that this constitutes a more general caveat that applies to nearly all investigations of the cellular misfolding of α-helical membrane proteins. Whether or not any current in vitro folding measurements report on conformational transitions that are relevant to cellular protein misfolding reactions remains an active area of debate (discussed further below). Nevertheless, while we concede that our structural and/ or computational evaluations of various mutagenic effects remain speculative, prevailing knowledge on the mechanisms of membrane protein folding suggest our mutations of interest (V276T and W107A) are highly unlikely to promote misfolding in precisely the same way. Thus, regardless of whether or not we were able experimentally compare the relevant folding energetics of GnRHR variants, we are confident that the distinct epistatic interactions formed by these mutations reflect variations in the misfolding mechanism and that they are distinct from the interactions that are observed in the context of stable proteins. In the following, we provide detailed considerations concerning these caveats in relation to the reviewers’ specific comments.

      Reviewer #1 (Public Review):

      The paper carries out an impressive and exhaustive non-sense mutagenesis using deep mutational scanning (DMS) of the gonadotropin-releasing hormone receptor for the WT protein and two single point mutations that I) influence TM insertion (V267T) and ii) influence protein stability (W107A), and then measures the effect of these mutants on correct plasma membrane expression (PME).

      Overall, most mutations decreased mGnRHR PME levels in all three backgrounds, indicating poor mutational tolerance under these conditions. The W107A variant wasn't really recoverable with low levels of plasma membrane localisation. For the V267T variant, most additional mutations were more deleterious than WT based on correct trafficking, indicating a synergistic effect. As one might expect, there was a higher degree of positive correlation between V267T/W107A mutants and other mutants located in TM regions, confirming that improper trafficking was a likely consequence of membrane protein co-translational folding. Nevertheless, context is important, as positive synergistic mutants in the V27T could be negative in the W107A background and vice versa. Taken together, this important study highlights the complexity of membrane protein folding in dissecting the mechanism-dependent impact of disease-causing mutations related to improper trafficking.

      Strengths

      This is a novel and exhaustive approach to dissecting how receptor mutations under different mutational backgrounds related to co-translational folding, could influence membrane protein trafficking.

      Weaknesses

      The premise for the study requires an in-depth understanding of how the single-point mutations analysed affect membrane protein folding, but the single-point mutants used seem to lack proper validation.

      Given our limited understanding of the structural properties of misfolded membrane proteins, it is unclear whether the relevant conformational effects of these mutations can be unambiguously validated using current biochemical and/ or biophysical folding assays. X-ray crystallography, cryo-EM, and NMR spectroscopy measurements have demonstrated that many purified GPCRs retain native-like structural ensembles within certain detergent micelles, bicelles, and/ or nanodiscs. However, helical membrane protein folding measurements typically require titration with denaturing detergents to promote the formation of a denatured state ensemble (DSE), which will invariably retain considerable secondary structure. Given that the solvation provided by mixed micelles is clearly distinct from that of native membranes, it remains unclear whether these DSEs represent a reasonable proxy for the misfolded conformations recognized by cellular quality control (QC, see https://doi.org/10.1021/acs.chemrev.8b00532). Thus, the use and interpretation of these systems for such purposes remains contentious in the membrane protein folding community. In addition to this theoretical issue, we are unaware of any instances in which GPCRs have been found to undergo reversible denaturation in vitro- a practical requirement for equilibrium folding measurements (https://doi.org/10.1146/annurev-biophys-051013-022926). We note that, while the resistance of GPCRs to aggregation, proteolysis, and/ or mechanical unfolding have also been probed in micelles, it is again unclear whether the associated thermal, kinetic, and/ or mechanical stability should necessarily correspond to their resistance to cotranslational and/ or posttranslational misfolding. Thus, even if we had attempted to validate the computational folding predictions employed herein, we suspect that any resulting correlations with cellular expression may have justifiably been viewed by many as circumstantial. Simply put, we know very little about the non-native conformations are generally involved in the cellular misfolding of α-helical membrane proteins, much less how to measure their relative abundance. From a philosophical standpoint, we prefer to let cells tell us what sorts of broken protein variants are degraded by their QC systems, then do our best to surmise what this tells us about the relevant properties of cellular DSEs.

      Despite this fundamental caveat, we believe that the chosen mutations and our interpretation of their relevant conformational effects are reasonably well-informed by current modeling tools and by prevailing knowledge on the physicochemical drivers of membrane protein folding and misfolding. Specifically, the mechanistic constraints of translocon-mediated membrane integration provide an understanding of the types of mutations that are likely to disrupt cotranslational folding. Though we are still learning about the protein complexes that mediate membrane translocation (https://doi.org/10.1038/s41586-022-05336-2), it is known that this underlying process is fundamentally driven by the membrane depth-dependent amino acid transfer free energies (https://doi.org/10.1146/annurev.biophys.37.032807.125904). This energetic consideration suggests introducing polar side chains near the center of a nascent TMDs should almost invariably reduce the efficiency of topogenesis. To confirm this in the context of TMD6 specifically, we utilized a well-established biochemical reporter system to confirm that V276T attenuates its translocon-mediated membrane integration (Fig. S1)- at least in the context of a chimeric protein. We also constructed a glycosylation-based topology reporter for full-length GnRHR, but ultimately found its’ in vitro expression to be insufficient to detect changes in the nascent topological ensemble. In contrast to V276T, the W107A mutation is predicted to preserve the native topological energetics of GnRHR due to its position within a soluble loop region. W107A is also unlike V276T in that it clearly disrupts tertiary interactions that stabilize the native structure. This mutation should preclude the formation of a structurally conserved hydrogen bonding network that has been observed in the context of at least 25 native GPCR structures (https://doi.org/10.7554/eLife.5489). However, without a relevant folding assay, the extent to which this network stabilizes the native GnRHR fold in cellular membranes remains unclear. Overall, we admit that these limitations have prevented us from measuring how much V276T alters the efficiency of GnRHR topogenesis, how much the W107A destabilizes the native fold, or vice versa. Nevertheless, given these design principles and the fact that both reduce the plasma membrane expression of GnRHR, as expected, we are highly confident that the structural defects generated by these mutations do, in fact, promote misfolding in their own ways. We also concede that the degree to which these mutagenic perturbations are indeed selective for specific folding processes is somewhat uncertain. However, it seems exceedingly unlikely that these mutations should disrupt topogenesis and/ or the folding of the native topomer to the exact same extent. From our perspective, this is the most important consideration with respect to the validity of the conclusions we have made in this manuscript.

      Furthermore, plasma membrane expression has been used as a proxy for incorrect membrane protein folding, but this not necessarily be the case, as even correctly folded membrane proteins may not be trafficked correctly, at least, under heterologous expression conditions. In addition, mutations can affect trafficking and potential post-translational modifications, like glycosylation.

      While the reviewer is correct that the sorting of folded proteins within the secretory pathway is generally inefficient, it is also true that the maturation of nascent proteins within the ER generally bottlenecks the plasma membrane expression of most α-helical membrane proteins. Our group and several others have demonstrated that the efficiency of ER export generally appears to scale with the propensity of membrane proteins to achieve their correct topology and/ or to achieve their native fold (see https://doi.org/10.1021/jacs.5b03743 and https://doi.org/10.1021/jacs.8b08243). Notably, these investigations all involved proteins that contain native glycosylation and various other post-translational modification sites. While we cannot rule out that certain specific combinations of mutations may alter expression through their perturbation of post-translational GnRHR modifications, we feel confident that the general trends we have observed across hundreds of variants predominantly reflect changes in folding and cellular QC. This interpretation is supported by the relationship between observed trends in variant expression and Rosetta-based stability calculations, which we identified using unbiased unsupervised machine learning approaches (compare Figs. 6B & 6D).

      Reviewer #2 (Public Review):

      Summary:

      In this paper, Chamness and colleagues make a pioneering effort to map epistatic interactions among mutations in a membrane protein. They introduce thousands of mutations to the mouse GnRH Receptor (GnRHR), either under wild-type background or two mutant backgrounds, representing mutations that destabilize GnRHR by distinct mechanisms. The first mutant background is W107A, destabilizing the tertiary fold, and the second, V276T, perturbing the efficiency of cotranslational insertion of TM6 to the membrane, which is essential for proper folding. They then measure the surface expression of these three mutant libraries, using it as a proxy for protein stability, since misfolded proteins do not typically make it to the plasma membrane. The resulting dataset is then used to shed light on how diverse mutations interact epistatically with the two genetic background mutations. Their main conclusion is that epistatic interactions vary depending on the degree of destabilization and the mechanism through which they perturb the protein. The mutation V276T forms primarily negative (aggravating) epistatic interactions with many mutations, as is common to destabilizing mutations in soluble proteins. Surprisingly, W107A forms many positive (alleviating) epistatic interactions with other mutations. They further show that the locations of secondary mutations correlate with the types of epistatic interactions they form with the above two mutants.

      Strengths:

      Such a high throughput study for epistasis in membrane proteins is pioneering, and the results are indeed illuminating. Examples of interesting findings are that: (1) No single mutation can dramatically rescue the destabilization introduced by W107A. (2) Epistasis with a secondary mutation is strongly influenced by the degree of destabilization introduced by the primary mutation. (3) Misfolding caused by mis-insertion tends to be aggravated by further mutations. The discussion of how protein folding energetics affects epistasis (Fig. 7) makes a lot of sense and lays out an interesting biophysical framework for the findings.

      Weaknesses:

      The major weakness comes from the potential limitations in the measurements of surface expression of severely misfolded mutants. This point is discussed quite fairly in the paper, in statements like "the W107A variant already exhibits marginal surface immunostaining" and many others. It seems that only about 5% of the W107A makes it to the plasma membrane compared to wild-type (Figures 2 and 3). This might be a low starting point from which to accurately measure the effects of secondary mutations.

      The reviewer raises an excellent point that we considered at length during the analysis of these data and the preparation of the manuscript. Though we remain confident in the integrity of these measurements and the corresponding analyses, we now realize this aspect of the data merits further discussion and documentation in our forthcoming revision, in which we will outline the following specific lines of reasoning.

      Still, the authors claim that measurements of W107A double mutants "still contain cellular subpopulations with surface immunostaining intensities that are well above or below that of the W107A single mutant, which suggests that this fluorescence signal is sensitive enough to detect subtle differences in the PME of these variants". I was not entirely convinced that this was true.

      We made this statement based on the simple observation that the surface immunostaining intensities across the population of recombinant cells expressing the library of W107A double mutants was consistently broader than that of recombinant cells expressing W107A GnRHR alone (see Author response image 1 for reference). Given that the recombinant cellular library represents a mix of cells expressing ~1600 individual variants that are each present at low abundance, the pronounced tails within this distribution presumably represent the composite staining of many small cellular subpopulations that express collections of variants that deviate from the expression of W107A to an extent that is significant enough to be visible on a log intensity plot.

      Author response image 1.

      Firstly, I think it would be important to test how much noise these measurements have and how much surface immunostaining the W107A mutant displays above the background of cells that do not express the protein at all.

      For reference, the average surface immunostaining intensity of HEK293T cells transiently expressing W107A GnRHR was 2.2-fold higher than that of the IRES-eGFP negative, untransfected cells within the same sample- the WT immunostaining intensity was 9.5-fold over background by comparison. Similarly, recombinant HEK293T cells expressing the W107A double mutant library had an average surface immunostaining intensity that was 2.6-fold over background across the two DMS trials. Thus, while the surface immunostaining of this variant is certainly diminished, we were still able to reliably detect W107A at the plasma membrane even under distinct expression regimes. We will include these and other signal-to-noise metrics for each experiment in a new table in the revised version of this manuscript.

      Beyond considerations related to intensity, we also previously noticed the relative intensity values for W107A double mutants exhibited considerable precision across our two biological replicates. If signal were too poor to detect changes in variant expression, we would have expected a plot of the intensity values across these two replicates to form a scatter. Instead, we found DMS intensity values for individual variants to be highly correlated from one replicate to the next (Pearson’s R= 0.97, see Author response image 2 for reference). This observation empirically demonstrates that this assay consistently differentiated between variants that exhibit slightly enhanced immunostaining from those that have even lower immunostaining than W107A GnRHR.

      Author response image 2.

      But more importantly, it is not clear if under this regimen surface expression still reports on stability/protein fitness. It is unknown if the W107A retains any function or folding at all. For example, it is possible that the low amount of surface protein represents misfolded receptors that escaped the ER quality control.

      While we believe that such questions are outside the scope of this work, we certainly agree that it is entirely possible that some of these variants bypass QC without achieving their native fold. This topic is quite interesting to us but is quite challenging to assess in the context of GPCRs, which have complex fitness landscapes that involve their propensity to distinguish between different ligands, engage specific components associated with divergent downstream signaling pathways, and navigate between endocytic recycling/ degradation pathways following activation. In light of the inherent complexity of GPCR function, we humbly suggest our choice of a relatively simple property of an otherwise complex protein may be viewed as a virtue rather than a shortcoming. Protein fitness is typically cast as the product of abundance and activity. Rather than measuring an oversimplified, composite fitness metric, we focused on one variable (plasma membrane expression) and its dominant effector (folding). We believe restraining the scope in this manner was key for the elucidation of clear mechanistic insights.

      The differential clustering of epistatic mutations (Fig. 6) provides some interesting insights as to the rules that dictate epistasis, but these too are dominated by the magnitude of destabilization caused by one of the mutations. In this case, the secondary mutations that had the most interesting epistasis were exceedingly destabilizing. With this in mind, it is hard to interpret the results that emerge regarding the epistatic interactions of W107A. Furthermore, the most significant positive epistasis is observed when W107A is combined with additional mutations that almost completely abolish surface expression. It is likely that either mutation destabilizes the protein beyond repair. Therefore, what we can learn from the fact that such mutations have positive epistasis is not clear to me. Based on this, I am not sure that another mutation that disrupts the tertiary folding more mildly would not yield different results. With that said, I believe that the results regarding the epistasis of V276T with other mutations are strong and very interesting on their own.

      We agree with the reviewer. In light of our results we believe it is virtually certain that the secondary mutations characterized herein would be likely to form distinct epistatic interactions with mutations that are only mildly destabilizing. Indeed, this insight reflects one of the key takeaway messages from this work- stability-mediated epistasis is difficult to generalize because it should depend on the extent to which each mutation changes the stability (ΔΔG) as well as initial stability of the WT/ reference sequence (ΔG, see Figure 7). Frankly, we are not so sure we would have pieced this together as clearly had we not had the fortune (or misfortune?) of including such a destructive mutation like W107A as a point of reference.

      Additionally, the study draws general conclusions from the characterization of only two mutations, W107A and V276T. At this point, it is hard to know if other mutations that perturb insertion or tertiary folding would behave similarly. This should be emphasized in the text.

      We agree and will be sure to emphasize this point in the revised manuscript.

      Some statistical aspects of the study could be improved:

      1. It would be nice to see the level of reproducibility of the biological replicates in a plot, such as scatter or similar, with correlation values that give a sense of the noise level of the measurements. This should be done before filtering out the inconsistent data.

      We thank the reviewer for this suggestion and will include scatters for each genetic background like the one shown above in the supplement of the revised version of the manuscript.

      1. The statements "Variants bearing mutations within the C- terminal region (ICL3-TMD6-ECL3-TMD7) fare consistently worse in the V276T background relative to WT (Fig. 4 B & E)." and "In contrast, mutations that are 210 better tolerated in the context of W107A mGnRHR are located 211 throughout the structure but are particularly abundant among residues 212 in the middle of the primary structure that form TMD4, ICL2, and ECL2 213 (Fig. 4 C & F)." are both hard to judge. Inspecting Figures 4B and C does not immediately show these trends, and importantly, a solid statistical test is missing here. In Figures 4E and F the locations of the different loops and TMs are not indicated on the structure, making these statements hard to judge.

      We apologize for this oversight and thank the reviewer for pointing this out. We will include additional statistical tests to reinforce these conclusions in the revised version of the manuscript.

      1. The following statement lacks a statistical test: "Notably, these 98 variants are enriched with TMD variants (65% TMD) relative to the overall set of 251 variants (45% TMD)." Is this enrichment significant? Further in the same paragraph, the claim that "In contrast to the sparse epistasis that is generally observed between mutations within soluble proteins, these findings suggest a relatively large proportion of random mutations form epistatic interactions in the context of unstable mGnRHR variants". Needs to be backed by relevant data and statistics, or at least a reference.

      We will include additional statistical tests for this in the revised manuscript and will ensure the language we use is consistent with the strength of the indicated statistical enrichment.

    1. Author response:

      Reviewer #1 (Public review):

      This work regards the role of Aurora Kinase A (AurA) in trained immunity. The authors claim that AurA is essential to the induction of trained immunity. The paper starts with a series of experiments showing the effects of suppressing AurA on beta-glucan-trained immunity. This is followed by an account of how AurA inhibition changes the epigenetic and metabolic reprogramming that are characteristic of trained immunity. The authors then zoom in on specific metabolic and epigenetic processes (regulation of S-adenosylmethionine metabolism & histone methylation). Finally, an inhibitor of AurA is used to reduce beta-glucan's anti-tumour effects in a subcutaneous MC-38 model.

      Strengths:

      With the exception of my confusion around the methods used for relative gene expression measurements, the experimental methods are generally well-described. I appreciate the authors' broad approach to studying different key aspects of trained immunity (from comprehensive transcriptome/chromatin accessibility measurements to detailed mechanistic experiments). Approaching the hypothesis from many different angles inspires confidence in the results (although not completely - see weaknesses section). Furthermore, the large drug-screening panel is a valuable tool as these drugs are readily available for translational drug-repurposing research.

      We thank the reviewer for the positive and encouraging comments.

      Weaknesses:

      (1) The manuscript contains factual inaccuracies such as: (a) Intro: the claim that trained cells display a shift from OXPHOS to glycolysis based on the paper by Cheng et al. in 2014; this was later shown to be dependent on the dose of stimulation and actually both glycolysis and OXPHOS are generally upregulated in trained cells (pmid 32320649).

      We appreciate the reviewer for pointing out this inaccuracy, and we will revise our statement to ensure accurate and updated description. We are aware that trained immunity involves different metabolic pathways, including both glycolysis and oxidative phosphorylation[1, 2]. We also detected Oxygen Consumption Rate (OCR, as detailed in comment#8) but observed no increase of oxygen consumption in trained BMDMs while previous study reported decreased oxidative phosphorylation[3]. We will discuss the potential reasons underlying such different results.

      (b) Discussion: Trained immunity was first described as such in 2011, not decades ago.

      We are sorry for the inaccurate description, and we will correct the statement in our revised manuscript as “Despite the fact that the concept of “trained immunity” has been proposed since 2011, the mechanisms that regulate trained immunity are still not completely understood.”

      (2) The authors approach their hypothesis from different angles, which inspires a degree of confidence in the results. However, the statistical methods and reporting are underwhelming.

      (a) Graphs depict mean +/- SEM, whereas mean +/- SD is almost always more informative. (b) The use of 1-tailed tests is dubious in this scenario. Furthermore, in many experiments/figures the case could be made that the comparisons should be considered paired (the responses of cells from the same animal are inherently not independent due to their shared genetic background and, up until cell isolation, the same host factors like serum composition/microbiome/systemic inflammation etc). (c) It could be explained a little more clearly how multiple testing correction was done and why specific tests were chosen in each instance.

      Thank you for the suggestions and we will revise all data presented as mean ± SEM in the manuscript to mean ± SD, and provide a detailed description of how multiple comparisons were performed and explain the rationale behind the different comparison methods used. Previous studies have shown that knockdown of GNMT increases intracellular SAM level and knockdown of GNMT is commonly used as a method to upregulate SAM[4-6]. Thus we used 1-tailed test in Figure 3J.

      (d) Most experiments are done with n = 3, some experiments are done with n = 5. This is not a lot. While I don't think power analyses should be required for simple in vitro experiments, I would be wary of drawing conclusions based on n = 3. It is also not indicated if the data points were acquired in independent experiments. ATAC-seq/RNA-seq was, judging by the figures, done on only 2 mice per group. No power calculations were done for the in vivo tumor model.

      We are sorry for the confusion in our description in figure legends. As for in vitro studies, we performed at least three independent experiments (BMs isolated from different mice) but we only display technical replicates data from one experiment in our manuscript. As for seq data, we acknowledge the reviewer's concern regarding the small sample size (n=2) in our RNA-seq/ATAC-seq experiment. We consider the sequencing experiment mainly as an exploratory approach, and performed rigorous quality control and normalization of the sequencing data to ensure the reliability of our findings. While we understand that a larger sample size would be ideal for drawing more definitive conclusions, we believe that the current data offer valuable preliminary insights that can inform future studies with larger cohorts. As a complementary method, we conducted ChIP PCR for detecting active histone modification enrichment in Il6 and Tnf region to further verify the increased accessibility of trained immunity induced inflammatory genes and reliability of our conclusions despite the small sample size. We hope this clarifies our approach, and we would be happy to further acknowledge and discuss the limitations of the current study.

      For the in vivo experiment, we determined the sample size by referring to the animal numbers used for similar experiments in literatures. And according to a reported resource equation approach for calculating sample size in animal studies[7], n=5-7 is suitable for most of our mouse experiments. We will describe the details in the revised methods part.

      (e) Furthermore, the data spread in many experiments (particularly BMDM experiments) is extremely small. I wonder if these are true biological replicates, meaning each point represents BMDMs from a different animal? (disclaimer: I work with human materials where the spread is of course always much larger than in animal experiments, so I might be misjudging this.).

      We are sorry for the confusion in our description in figure legends. In vivo experiments represent individual mice as biological replicates, the exact values of n are reported in figure legends and each point represents data from a different animal (Figure 1I, and Figure 6). The in vitro cell assay was performed in triplicates, each experiment was independently replicated at least three times and points represents technical replicates.

      (3) Maybe the authors are reserving this for a separate paper, but it would be fantastic if the authors would report the outcomes of the entire drug screening instead of only a selected few. The field would benefit from this as it would save needless repeat experiments. The list of drugs contains several known inhibitors of training (e.g. mTOR inhibitors) so there must have been more 'hits' than the reported 8 Aurora inhibitors.

      Thank you for your suggestion and we will report the outcomes of the entire drug screening in the revised manuscript.

      (4) Relating to the drug screen and subsequent experiments: it is unclear to me in supplementary figure 1B which concentrations belong to secondary screens #1/#2 - the methods mention 5 µM for the primary screen and "0.2 and 1 µM" for secondary screens, is it in this order or in order of descending concentration?

      Thank you for your comments and we are sorry for unclear labelled results in supplementary 1B. We performed secondary drug screen at two concentrations, and drug concentrations corresponding to secondary screen#1 and #2 are 0.2, 1 μM respectively. That is to say, it is just in this order, not in an order of descending concentration.

      (a) It is unclear if the drug screen was performed with technical replicates or not - the supplementary figure 1B suggests no replicates and quite a large spread (in some cases lower concentration works better?)

      Thank you for your question. The drug screen was performed without technical replicates. Actually, we observed s a lower concentration works better in some cases. This might be due to the fact that the drug's effect correlates positively with its concentration only within a specific range (as seen in comment#4).

      (5) The methods for (presumably) qPCR for measuring gene expression in Figure 1C are missing. Which reference gene was used and is this a suitably stable gene?

      We are sorry for the omission for the qPCR method. The mRNA expression of Il6 and Tnf in trained BMDMs was normalized to untrained BMDMs and β-actin served as a reference gene. And we will describe in detail in our revised manuscript.

      (6) From the complete unedited blot image of Figure 1D it appears that the p-Aurora and total Aurora are not from the same gel (discordant number of lanes and positioning). This could be alright if there are no/only slight technical errors, but I find it misleading as it is presented as if the actin (loading control to account for aforementioned technical errors!) counts for the entire figure.

      Thanks for this comment. In the original data, p-Aurora and total Aurora were from different gels. In this experiment the membrane stripping/reprobing after p-Aurora antibody did now work well, so we couldn’t get all results from one gel, and we had to run another gel using the same samples to blot with anti-aurora antibody. Yes we should have provided separated actin blots as loading controls for this experiment. We will repeat the experiment and provide original data of three biological replicates to confirm the experiment result.

      Figure 2: This figure highlights results that are by far not the strongest ones - I think the 'top hits' deserve some more glory. A small explanation on why the highlighted results were selected would have been fitting.

      We appreciate the valuable suggestion. We will make a discussion in our revised manuscript.

      (7) Figure 3 incl supplement: the carbon tracing experiments show more glucose-carbon going into TCA cycle (suggesting upregulated oxidative metabolism), but no mito stress test was performed on the seahorse.

      We appreciate this question raised by the reviewer. We previously performed seahorse XF analyze to measure mito stress in β-glucan trained BMDMs in combination with alisertib (data not shown in our submitted manuscript). The results showed no increase in oxidative phosphorylation under β-glucan stimulation.

      Author response image 1.

      (8) Inconsistent use of an 'alisertib-alone' control in addition to 'medium', 'b-glucan', 'b-glucan + alisertib'. This control would be of great added value in many cases, in my opinion.

      Thank you for your comment. We appreciate that including “alisertib-alone” group throughout all the experiments may add more value to the findings. We set the aim of the current study to investigate the role of Aurora kinase A in trained immunity. Therefore, in most settings, we did not focus on the role of aurora kinase A without β-glucan stimulation. Initially, we showed in Figure 1B and 1C that alisertib alone in a concentration lower than 1μM (included) does not affect the response to secondary stimulus. In a previous report, the authors showed that Aurora A inhibitor alone did not affect trained immunity[8]. Thus, we did not include this control group in all of the experiments.

      (9) Figure 4A: looking at the unedited blot images, the blot for H3K36me3 appears in its original orientation, whereas other images appear horizontally mirrored. Please note, I don't think there is any malicious intent but this is quite sloppy and the authors should explain why/how this happened (are they different gels and the loading sequence was reversed?)

      Thank you for pointing out this error. After checking the original data, we found that we indeed misassembled the orientation of several blots. We went through the assembling process and figured out that some orientations were assembled according to the loading sequences but not saved, so that the orientations in Figure 4A were not consistent with the unedited blot image. We are sorry for the careless mistake, and we will double check to make sure all the blots are correctly assembled in the revised manuscript.

      (10) For many figures, for example prominently figure 5, the text describes 'beta-glucan training' whereas the figures actually depict acute stimulation with beta-glucan. While this is partially a semantic issue (technically, the stimulation is 'the training-phase' of the experiment), this could confuse the reader.

      Thanks for the reviewer’s suggestion and we will reorganize our language to ensure clarity and avoid any inconsistencies that might lead to misunderstanding.

      (11) Figure 6: Cytokines, especially IL-6 and IL-1β, can be excreted by tumour cells and have pro-tumoral functions. This is not likely in the context of the other results in this case, but since there is flow cytometry data from the tumour material it would have been nice to see also intracellular cytokine staining to pinpoint the source of these cytokines.

      Thanks for the reviewer’s suggestion. To address potential concerns raised by the reviewers, we will perform intracellular cytokines staining in tumor experiments with mice trained with β-glucan or in combination with alisertib followed MC38 inoculation.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the inhibition of Aurora A and its impact on β-glucan-induced trained immunity via the FOXO3/GNMT pathway. The study demonstrates that inhibition of Aurora A leads to overconsumption of SAM, which subsequently impairs the epigenetic reprogramming of H3K4me3 and H3K36me3, effectively abolishing the training effect.

      Strengths:

      The authors identify the role of Aurora A through small molecule screening and validation using a variety of molecular and biochemical approaches. Overall, the findings are interesting and shed light on the previously underexplored role of Aurora A in the induction of β-glucan-driven epigenetic change.

      We thank the reviewer for the positive and encouraging comments.

      Weaknesses:

      Given the established role of histone methylations, such as H3K4me3, in trained immunity, it is not surprising that depletion of the methyl donor SAM impairs the training response. Nonetheless, this study provides solid evidence supporting the role of Aurora A in β-glucan-induced trained immunity in murine macrophages. The part of in vivo trained immunity antitumor effect is insufficient to support the final claim as using Alisertib could inhibits Aurora A other cell types other than myeloid cells.

      We appreciate the question raised by the reviewer. Though SAM generally acts as methyl donor, whether the epigenetic reprogram in trained immunity is directly linked to SAM metabolism is not known. In our study, we provided evidence suggesting the necessity of SAM maintenance in supporting trained immunity. As for in vivo tumor model, tumor cells were subcutaneously inoculated 24 h after oral administration of alisertib. Previous studies showed alisertib administered orally had a half-life of 10 h and 90% concentration reduction in serum after 24 h [9, 10]. Therefore, we suppose that tumor cells are more susceptible to long-term effects of drugs on the immune system rather than directly affected by alisertib. To further address the reviewer’s concern, we will perform bone marrow transplantation (trained mice as donor and naïve mice as recipient) to clarify the mechanistic contribution of trained immunity versus off-target effects.

      Cited references

      (1) Ferreira, A.V., et al., Metabolic Regulation in the Induction of Trained Immunity. Semin Immunopathol, 2024. 46(3-4): p. 7.

      (2) Keating, S.T., et al., Rewiring of glucose metabolism defines trained immunity induced by oxidized low-density lipoprotein. J Mol Med (Berl), 2020. 98(6): p. 819-831.

      (3) Li, X., et al., Maladaptive innate immune training of myelopoiesis links inflammatory comorbidities. Cell, 2022. 185(10): p. 1709-1727.e18.

      (4) Luka, Z., S.H. Mudd, and C. Wagner, Glycine N-methyltransferase and regulation of S-adenosylmethionine levels. J Biol Chem, 2009. 284(34): p. 22507-11.

      (5) Hughey, C.C., et al., Glycine N-methyltransferase deletion in mice diverts carbon flux from gluconeogenesis to pathways that utilize excess methionine cycle intermediates. J Biol Chem, 2018. 293(30): p. 11944-11954.

      (6) Simile, M.M., et al., Nuclear localization dictates hepatocarcinogenesis suppression by glycine N-methyltransferase. Transl Oncol, 2022. 15(1): p. 101239.

      (7) Arifin, W.N. and W.M. Zahiruddin, Sample Size Calculation in Animal Studies Using Resource Equation Approach. Malays J Med Sci, 2017. 24(5): p. 101-105.

      (8) Benjaskulluecha, S., et al., Screening of compounds to identify novel epigenetic regulatory factors that affect innate immune memory in macrophages. Sci Rep, 2022. 12(1): p. 1912.

      (9) Yang, J.J., et al., Preclinical drug metabolism and pharmacokinetics, and prediction of human pharmacokinetics and efficacious dose of the investigational Aurora A kinase inhibitor alisertib (MLN8237). Drug Metab Lett, 2014. 7(2): p. 96-104.

      (10) Palani, S., et al., Preclinical pharmacokinetic/pharmacodynamic/efficacy relationships for alisertib, an investigational small-molecule inhibitor of Aurora A kinase. Cancer Chemother Pharmacol, 2013. 72(6): p. 1255-64.

    1. Author Response

      We are grateful for the constructive comments of the reviewers. Here is a provisional response to major questions.

      To Question 1, we appreciate that you point out that the phenotypes of pan-neuronal knockout of PDFR by unmodified Cas9 (Fig 2H-2I, in previous manuscript) whose morning anticipation still exist at some level (Fig a) though the decreases of morning anticipation index (Fig b) and advanced evening activity were not as pronounced as observed in han5304 (Fig 3C Hyun et al., 2005), our response is that the difference between pan-neuronal knockout of PDFR by unmodified Cas9 might be caused by the limited efficiency of unmodified Cas9 in our conditional system. We will adjust the relevant conclusions in the revised version, and these findings underscore the necessity to enhance the efficiency of the original Cas9

      Author response image 1.

      To Question 2, that some expression profiles of clock neurons are not consistent with previous reports, such as Dh31 and ChAT in s-LNvs, our response is that the differences can be attributed to the variation in expression patterns between 3’ terminal KI-LexA (used in this gene expression dissection) and KO-GAL4, KI-GAL4, or transgenic GAL4. We have indeed observed differences when identical sites were inserted in frame with Gal4 or LexA.

      To Question 3, that our description of advanced morning anticipation versus no morning anticipation with the term "opposite" is not accurate enough, our response is that we will modify that. Mutants of CNMa or CNMaR exhibit advanced morning activity, suggesting an inhibitory role of CNMa/CNMaR. Mutants of Pdf/Pdfr, on the other hand, showed no morning anticipation, indicating a promoting role in morning anticipation.

      To Question 4, whether we have generated transgenic UAS-sgRNA flies for all CCT genes or only a subset, our response is that we have indeed generated UAS-sgRNA flies for all CCT genes.

    1. Reviewer #2 (Public Review):

      The goal of the present study is to better understand the 'control objectives' that subjects adopt in a video-game-like virtual-balancing task. In this task, the hand must move in the opposite direction from a cursor. For example, if the cursor is 2 cm to the right, the subject must move their hand 2 cm to the left to 'balance' the cursor. Any imperfection in that opposition causes the cursor to move. E.g., if the subject were to move only 1.8 cm, that would be insufficient, and the cursor would continue to move to the right. If they were to move 2.2 cm, the cursor would move back toward the center of the screen. This return to center might actually be 'good' from the subject's perspective, depending on whether their objective is to keep the cursor still or keep it near the screen's center. Both are reasonable 'objectives' because the trial fails if the cursor moves too far from the screen's center during each six-second trial.

      This task was recently developed for use in monkeys (Quick et al., 2018), with the intention of being used for the study of the cortical control of movement, and also as a task that might be used to evaluate BMI control algorithms. The purpose of the present study is to better characterize how this task is performed. What sort of control policies are used. Perhaps more deeply, what kind of errors are those policies trying to minimize? To address these questions, the authors simulate control-theory style models and compare with behavior. They do in both in monkeys and in humans.

      These goals make sense as a precursor to future recording or BMI experiments. The primate motor-control field has long been dominated by variants of reaching tasks, so introducing this new task will likely be beneficial. This is not the first non-reaching task, but it is an interesting one and it makes sense to expand the presently limited repertoire of tasks. The present task is very different from any prior task I know of. Thus, it makes sense to quantify behavior as thoroughly as possible in advance of recordings. Understanding how behavior is controlled is, as the authors note, likely to be critical to interpreting neural data.

      From this perspective - providing a basis for interpreting future neural results - the present study is fairly successful. Monkeys seem to understand the task properly, and to use control policies that are not dissimilar from humans. Also reassuring is the fact that behavior remains sensible even when task-difficulty become high. By 'sensible' I simply mean that behavior can be understood as seeking to minimize error: position, velocity, or (possibly) both, and that this remains true across a broad range of task difficulties. The authors document why minimizing position and minimizing velocity are both reasonable objectives. Minimizing velocity is reasonable, because a near-stationary cursor can't move far in six seconds. Minimizing position error is reasonable, because the trial won't fail if the cursor doesn't stray far from the center. This is formally demonstrated by simulating control policies: both objectives lead to control policies that can perform the task and produce realistic single-trial behavior. The authors also demonstrate that, via verbal instruction, they can induce human subjects to favor one objective over the other. These all seem like things that are on the 'need to know' list, and it is commendable that this amount of care is being taken before recordings begin, as it will surely aid interpretation.

      Yet as a stand-alone study, the contribution to our understanding of motor control is more limited. The task allows two different objectives (minimize velocity, minimize position) to be equally compatible with the overall goal (don't fail the trial). Or more precisely, there exists a range of objectives with those two at the extreme. So it makes sense that different subjects might choose to favor different objectives, and also that they can do so when instructed. But has this taught us something about motor control, or simply that there is a natural ambiguity built into the task? If I ask you to play a game, but don't fully specify the rules, should I be surprised that different people think the rules are slightly different?

      The most interesting scientific claim of this study is not the subject-to-subject variability; the task design makes that quite likely and natural. Rather, the central scientific result is the claim that individual subjects are constantly switching objectives (and thus control policies), such that the policy guiding behavior differs dramatically even on a single-trial basis. This scientific claim is supported by a technical claim: that the authors' methods can distinguish which objective is in use, even on single trials. I am uncertain of both claims.

      Consider Figure 8B, which reprises a point made in Figure 1&3 and gives the best evidence for trial-to-trial variability in objective/policy. For every subject, there are two example trials. The top row of trials shows oscillations around the center, which could be consistent with position-error minimization. The bottom row shows tolerance of position errors so long as drift is slow, which could be consistent with velocity-error minimization. But is this really evidence that subjects were switching objectives (and thus control policies) from trial to trial? A simpler alternative would be a single control policy that does not switch, but still generates this range of behaviors. The authors don't really consider this possibility, and I'm not sure why. One can think of a variety of ways in which a unified policy could produce this variation, given noise and the natural instability of the system.

      Indeed, I found that it was remarkably easy to produce a range of reasonably realistic behaviors, including the patterns that the authors interpret as evidence for switching objectives, based on a simple fixed controller. To run the simulations, I made the simple assumption that subjects simply attempt to match their hand position to oppose the cursor position. Because subjects cannot see their hand, I assumed modest variability in the gain, with a range from -1 to -1.05. I assumed a small amount of motor noise in the outgoing motor command. The resulting (very simple) controller naturally displayed the basic range of behaviors observed across trials (see Image 1)

      Peer review image 1.

      Some trials had oscillations around the screen center (zero), which is the pattern the authors suggest reflects position control. In other trials the cursor was allowed to drift slowly away from the center, which is the pattern the authors suggest reflects velocity control. This is true even though the controller was the same on every trial. Trial-to-trial differences were driven both by motor noise and by the modest variability in gain. In an unstable system, small differences can lead to (seemingly) qualitatively different behavior on different trials.

      This simple controller is also compatible with the ability of subjects to adapt their strategy when instructed. Anyone experienced with this task likely understands (or has learned) that moving the hand slightly more than 'one should' will tend to shepherd the cursor back to center, at the cost of briefly high velocity. Using this strategy more sparingly will tend to minimize velocity even if position errors persist. Thus, any subject using this control policy would be able to adapt their strategy via a modest change in gain (the gain linking visible cursor position to intended hand position).

      This model is simple, and there may be reasons to dislike it. But it is presumably a reasonable model. The nature of the task is that you should move your hand opposite where the cursor is. Because you can't see your hand, you will make small mistakes. Due to the instability of the system, those small mistakes have large and variable effects. This feature is likely common to other controllers as well; many may explicitly or implicitly blend position and velocity control, with different trials appearing more dominated by one versus the other. Given this, I think the study presents only weak evidence that individual subjects are switching their objective on individual trials. Indeed, the more parsimonious explanation may be that they aren't. While the study certainly does demonstrate that the control policy can be influenced by verbal instructions, this might be a small adjustment as noted above.

      I thus don't feel convinced that the authors can conclusively tell us the true control policy being used by human and monkey subjects, nor whether that policy is mostly fixed or constantly switching. The data are potentially compatible with any of these interpretations, depending on which control-style model one prefers.

      I see a few paths that the authors might take if they chose.<br /> --First, my reasoning above might be faulty, or there might be additional analyses that could rule out the possibility of a unified policy underlying variable behavior. If so, the authors may be able to reject the above concerns and retain the present conclusions. The main scientifically novel conclusion of the present study is that subjects are using a highly variable control policy, and switching on individual trials. If this is indeed the case, there may be additional analyses that could reveal that.<br /> --Second, additional trial types (e.g., with various perturbations) might be used as a probe of the control policy. As noted below, there is a long history of doing this in the pursuit system. That additional data might better disambiguate control policies both in general, and across trials.<br /> --Third, the authors might find that a unified controller is actually a good (and more parsimonious) explanation. Which might actually be a good thing from the standpoint of future experiments. Interpretation of neural data is likely to be much easier if the control policy being instantiated isn't in constant flux.

      In any case, I would recommend altering the strength of some conclusions, particularly the conclusion that the presented methods can reliably discriminate amongst objectives/policies on individual trials. This is mentioned as a major motivation on multiple occasions, but in most of these instances, the subsequent analysis infers the objective only across trial (e.g., one must observe a scatterplot of many trials). By Figure 7, they do introduce a method for inferring the control policy on individual trials, and while this seems to work considerably better than chance, it hardly appears reliable.

      In this same vein I would suggest toning down aspects of the Introduction and Discussion. The Introduction in particular is overly long, and tries to position the present study as unique in ways that seem strained. Other studies have built links between human behavior, monkey behavior, and monkey neural data (for just one example, consider the corpus of work from the Scott lab that includes Pruszynski et al. 2008 and 2011). Other studies have used highly quantitative methods to infer the objective function used by subjects (e.g. Kording and Wolpert 2004). The very issue that is of interest in the present study - velocity-error-minimization versus position-error-minimization - has been extensively addressed in the smooth pursuit system. That field has long combined quantitative analyses of behavior in humans and monkeys, along with neural recordings. Many pursuit experiments used strategies that could be fruitfully employed to address the central questions of the present study. For example, error stabilization was important for dissecting the control policy used by the pursuit system. By artificially stabilizing the error (position or velocity) at zero, or at some other value, one can determine the system's response. The classic Rashbass step (1961) put position and velocity errors in opposition, to see which dominates the response. Step and sinusoidal perturbations were useful in distinguishing between models, as was the imposition of artificially imposed delays. The authors note the 'richness' of the behavior in the present task, and while one could say the same of pursuit, it was still the case that specific and well-thought through experimental manipulations were pretty critical. It would be better if the Introduction considered at least some of the above-mentioned work (or other work in a similar vein). While most would agree with the motivations outlined by the authors - they are logical and make sense - the present Introduction runs the risk of overselling the present conclusions while underselling prior work.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their previous publication (Dong et al. Cell Reports 2024), the authors showed that citalopram treatment resulted in reduced tumor size by binding to the E380 site of GLUT1 and inhibiting the glycolytic metabolism of HCC cells, instead of the classical citalopram receptor. Given that C5aR1 was also identified as the potential receptor of citalopram in the previous report, the authors focused on exploring the potential of the immune-dependent anti-tumor effect of citalopram via C5aR1. C5aR1 was found to be expressed on tumor-associated macrophages (TAMs) and citalopram administration showed potential to improve the stability of C5aR1 in vitro. Through macrophage depletion and adoptive transfer approaches in HCC mouse models, the data demonstrated the potential importance of C5aR1-expressing macrophage in the anti-tumor effect of citalopram in vivo. Mechanistically, their in vitro data suggested that citalopram may regulate the phagocytosis potential and polarization of macrophages through C5aR1. Next, they tried to investigate the direct link between citalopram and CD8+T cells by including an additional MASH-associated HCC mouse model. Their data suggest that citalopram may upregulate the glycolytic metabolism of CD8+T cells, probability via GLUT3 but not GLUT1-mediated glucose uptake. Lastly, as the systemic 5-HT level is down-regulated by citalopram, the authors analyzed the association between a low 5-HT and a superior CD8+T cell function against a tumor. Although the data is informative, the rationale for working on additional mechanisms and logical links among different parts is not clear. In addition, some of the conclusion is also not fully supported by the current data.

      Thanks very much for your insightful evaluation and the constructive suggestions. We have thoroughly studied the comments and a provisional point-to-point response is shown as follows.

      Strengths:

      The idea of repurposing clinical-in-used drugs showed great potential for immediate clinical translation. The data here suggested that the anti-depression drug, citalopram displayed an immune regulatory role on TAM via a new target C5aR1 in HCC.

      Thank you for your constructive comments. We believe that further investigation into the mechanisms by which citalopram modulates TAM function could provide valuable insights into its potential role in HCC therapy.

      Weaknesses:

      (1) The authors concluded that citalopram had a 'potential immune-dependent effect' based on the tumor weight difference between Rag-/- and C57 mice in Figure 1. However, tumor weight differences may also be attributed to a non-immune regulatory pathway. In addition, how do the authors calculate relative tumor weight? What is the rationale for using relative one but not absolute tumor weight to reflect the anti-tumor effect?

      We appreciate your insights into the potential contributions of non-immune regulatory pathways to the observed tumor weight differences between Rag-/- and C57 mice, and we will further address this issue in our discussion. The relative tumor weight was calculated by assigning an arbitrary value of 1 to the Rag1<sup>-/-</sup> mice in the DMSO treatment group, with all other tumor weights expressed relative to this baseline. As suggested, we will include absolute tumor weight data in our revised manuscript.

      (2) The authors used shSlc6a4 tumor cell lines to demonstrate that citalopram's effects are independent of the conventional SERT receptor (Figure 1C-F). However, this does not entirely exclude the possibility that SERT may still play a role in this context, as it can be expressed in other cells within the tumor microenvironment. What is the expression profiling of Slc6a4 in the HCC tumor microenvironment? In addition, in Figure 1F, the tumor growth of shSlc6a4 in C57 mice displayed a decreased trend, suggesting a possible role of Slc6a4.

      To identify the expression patterns of Slc6a4 in different cellular contexts within the HCC tumor microenvironment, we will conduct a thorough screening of HCC datasets that include single-cell sequencing analysis. The possible role of Slc6a4 on tumor growth will be verified with in vitro loss-of-function experiments.

      (3) Why did the authors choose to study phagocytosis in Figures 3G-H? As an important player, TAM regulates tumor growth via various mechanisms.

      Thank you for your question. We focused on this aspect because citalopram targets C5aR1-expressing TAM. C5aR1 is a receptor for complement component C5a, and complement components play a significant role in mediating the phagocytosis process in macrophages. In the revised manuscript, we will emphasize this rationale clearly.

      (4) The information on unchanged deposition of C5a has been mentioned in this manuscript (Figures 3D and 3F), the authors should explain further in the manuscript, for example, C5a could bind to receptors other than C5aR1 and/or C5a bind to C5aR1 by different docking anchors compared with citalopram.

      Thank you for your insightful comment. First, we will investigate the docking anchors involved in the binding of C5a to C5aR1 and compare these interactions with those of C5aR1 and citalopram. Additionally, we will discuss the potential binding of C5a to other receptors, providing a broader perspective on the signaling mechanisms.

      (5) Figure 3I-M - the flow cytometry data suggested that citalopram treatment altered the proportions of total TAM, M1 and M2 subsets, CD4+ and CD8+T cells, DCs, and B cells. Why does the author conclude that the enhanced phagocytosis of TAM was one of the major mechanisms of citalopram? As the overall TAM number was regulated, the contribution of phagocytosis to tumor growth may be limited.

      As suggested, we will restate the conclusion to enhance clarity and better articulate the relationship between citalopram treatment, TAM populations, and their phagocytic activity. Thank you for your valuable input.

      (6) Figure 4 - what is the rationale for using the MASH-associated HCC mouse model to study metabolic regulation in CD8+T cells? The tumor microenvironment and tumor growth would be quite different. In addition, how does this part link up with the mechanisms related to C5aR1 and TAM? The authors also brought GLUT1 back in the last part and focused on CD8+T cell metabolism, which was totally separated from previous data.

      We chose the MASH-associated HCC mouse model because it closely mimics the etiology of metabolic-associated fatty liver disease (MAFLD), which is a significant contributor to the development of cirrhosis and HCC. The inclusion of CD8<sup>+</sup> T cells in our study is based on the understanding that citalopram targets GLUT1, which plays a crucial role in glucose uptake. CD8<sup>+</sup> T cell function is heavily reliant on glycolytic metabolism, making it essential to investigate how citalopram’s effects on GLUT1 influence the metabolic pathways and functionality of these immune cells. The data presented in this section primarily aim to demonstrate how citalopram influences peripheral 5-HT levels, which subsequently affects CD8<sup>+</sup> T cell functionality. By linking these findings, we will clarify how citalopram impacts both TAM and CD8<sup>+</sup> T cells. In the revised manuscript, we will enhance the background information and provide relevant data support to avoid any gaps.

      (7) Figure 5, the authors illustrated their mechanism that citalopram regulates CD8+T cell anti-tumor immunity through proinflammatory TAM with no experimental evidence. Using only CD206 and MHCII to represent TAM subsets obviously is not sufficient.

      As suggested, more relevant experimental data will be included in the revised manuscript to better characterize the TAM populations and their roles in mediating the effects of citalopram on CD8<sup>+</sup> T cells.

      Reviewer #2 (Public review):

      Summary:

      Dong et al. present a thorough investigation into the potential of repurposing citalopram, an SSRI, for hepatocellular carcinoma (HCC) therapy. The study highlights the dual mechanisms by which citalopram exerts anti-tumor effects: reprogramming tumor-associated macrophages (TAMs) toward an anti-tumor phenotype via C5aR1 modulation and suppressing cancer cell metabolism through GLUT1 inhibition while enhancing CD8+ T cell activation. The findings emphasize the potential of drug repurposing strategies and position C5aR1 as a promising immunotherapeutic target. However, certain aspects of experimental design and clinical relevance could be further developed to strengthen the study's impact.

      Thank you for your thoughtful review and constructive feedback, and we look forward to improving our manuscript accordingly.

      Strength:

      It provides detailed evidence of citalopram's non-canonical action on C5aR1, demonstrating its ability to modulate macrophage behavior and enhance CD8+ T cell cytotoxicity. The use of DARTS assays, in silico docking, and gene signature network analyses offers robust validation of drug-target interactions. Additionally, the dual focus on immune cell reprogramming and metabolic suppression presents a thorough strategy for HCC therapy. By emphasizing the potential for existing drugs like citalopram to be repurposed, the study also underscores the feasibility of translational applications.

      Your insights reinforce the significance of our findings, and we will ensure that these points are clearly articulated in the revised manuscript to enhance its impact.

      Major weaknesses/suggestions:

      The dataset and signature database used for GSEA analyses are not clearly specified, limiting reproducibility. The manuscript does not fully explore the potential promiscuity of citalopram's interactions across GLUT1, C5aR1, and SERT1, which could provide a deeper understanding of binding selectivity. The absence of GLUT1 knockdown or knockout experiments in macrophages prevents a complete assessment of GLUT1's role in macrophage versus tumor cell metabolism. Furthermore, there is minimal discussion of clinical data on SSRI use in HCC patients. Incorporating survival outcomes based on SSRI treatment could strengthen the study's translational relevance.

      By addressing these limitations, the manuscript could make an even stronger contribution to the fields of cancer immunotherapy and drug repurposing.

      We appreciate your valuable suggestions. As suggested, we will take the following actions:

      (1) GSEA analysis: we will clearly specify the datasets and signature databases used for the GSEA in the revised manuscript.

      (2) Exploration of binding selectivity: we recognize the importance of exploring the potential promiscuity of citalopram’s interactions across GLUT1, C5aR1, and SERT1. As suggested, we will include a more detailed analysis of these interactions, which will help elucidate binding selectivity and its implications for therapeutic outcomes.

      (3) GLUT1 knockdown in macrophages: to address the gap in our assessment of GLUT1’s role in macrophages, we will incorporate GLUT1 knockdown or knockout experiments in macrophages upon citalopram treatment. Moreover, a DARTS assay for GLUT1 in THP-1 cells will be conducted.

      (4) Clinical data on SSRI use in HCC patients: Related data have been reported previously in PMID: 39388353 (Cell Rep. 2024 Oct 22;43(10):114818.). As detailed below:

      “SSRIs use is associated with reduced disease progression in HCC patients

      We determined whether SSRIs for alleviating HCC are supported by real-world data. A total of 3061 patients with liver cancer were extracted from the Swedish Cancer Register. Among them, 695 patients had been administrated with post-diagnostic SSRIs. The Kaplan-Meier survival analysis suggested that patients who utilized SSRIs exhibited a significantly improved metastasis-free survival compared to those who did not use SSRIs, with a P value of log-rank test at 0.0002. Cox regression analysis showed that SSRI use was associated with a lower risk of metastasis (HR = 0.78; 95% CI, 0.62-0.99).”

      Author response image 1.

    1. Author Response

      eLife assessment

      The authors' finding that PARG hydrolase removal of polyADP-ribose (PAR) protein adducts generated in response to the presence of unligated Okazaki fragments is important for S-phase progression is potentially valuable, but the evidence is incomplete, and identification of relevant PARylated PARG substrates in S-phase is needed to understand the role of PARylation and dePARylation in S-phase progression. Their observation that human ovarian cancer cells with low levels of PARG are more sensitive to a PARG inhibitor, presumably due to the accumulation of high levels of protein PARylation, suggests that low PARG protein levels could serve as a criterion to select ovarian cancer patients for treatment with a PARG inhibitor drug.

      Thank you for the assessment and summary. Please see below for details as we have now addressed the deficiencies pointed out by the reviewers.

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      Reviewer #1 (Public Review):

      I have a major conceptual problem with this manuscript: How can the full deletion of a gene (PARG) sensitize a cell to further inhibition by its chemical inhibitor (PARGi) since the target protein is fully absent?

      Please see below for details about this point. Briefly, we found that PARG is an essential gene (Fig. 7). There was residual PARG activity in our PARG KO cells, although the loss of full-length PARG was confirmed by Western blotting and DNA sequencing (Fig. S9). The residual PARG activity in these cells can be further inhibited by PARG inhibitor, which eventually lead to cell death.

      The authors state in the discussion section: "The residual PARG dePARylation activity observed in PARG KO cells likely supports cell growth, which can be further inhibited by PARGi". What does this statement mean? Is the authors' conclusion that their PARG KOs are not true KOs but partial hypomorphic knockdowns? Were the authors working with KO clones or CRISPR deletion in populations of cells?

      The reviewer is correct that our PARG KOs are not true KOs. We were working with CRISPR edited KO clones. As shown in this manuscript, we validated our KO clones by Western blotting, DNA sequencing and MMS-induced PARylation. Despite these efforts and our inability to detect full-length PARG in our KO clones, we suspect that our PARG KO cells may still express one or more active fragments of PARG due to alternative splicing and/or alternative ATG usage.

      As shown in Fig. 7, we believe that PARG is essential for proliferation. Our initial KO cell lines are not complete PARG KO cells and residual PARG activity in these cells could support cell proliferation. Unfortunately, due to lack of appropriate reagents we could not draw solid conclusions regarding the isoforms or the truncated PARG expressed in these cells (Please see Western blots below).

      Are there splice variants of PARG that were not knocked down? Are there PARP paralogues that can complement the biochemical activity of PARG in the PARG KOs? The authors do not discuss these critical issues nor engage with this problem.

      There are five reviewed or potential PARG isoforms identified in the Uniprot database. The sgRNAs used to generate initial PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), while isoforms 4 and 5 are considered catalytically inactive according to the Uniprot database. However, it is likely that sgRNA-mediated genome editing may lead to the creation of new alternatively spliced PARG mRNAs or the use of alternative ATG, which can produce catalytically active forms of PARG. Instead of searching for these putative spliced PARG RNAs, we used two independent antibodies that recognize the C-terminus of PARG for WB as shown in Author response image 1. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoform was expressed in our PARG KO cells. Nevertheless, we directly measured PARG activity in PARG KO cells (Fig. S9) and showed that we were still able to detect residual PARG activity in these PARG KO cells. These data clearly indicate that residual PARG activity are present and detected in our KO cells, but the precise nature of these truncated forms of PARG remains elusive.

      Author response image 1.

      These issues have to be dealt with upfront in the manuscript for the reader to make sense of their work.

      We thank this reviewer for his/her constructive comments and suggestions. We will include the data above and additional discussion upfront in our revised manuscript to avoid any further confusion by our readers.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Nie et al investigate the effect of PARG KO and PARG inhibition (PARGi) on pADPR, DNA damage, cell viability, and synthetic lethal interactions in HEK293A and Hela cells. Surprisingly, the authors report that PARG KO cells are sensitive to PARGi and show higher pADPR levels than PARG KO cells, which are abrogated upon deletion or inhibition of PARP1/PARP2. The authors explain the sensitivity of PARG KO to PARGi through incomplete PARG depletion and demonstrate complete loss of PARG activity when incomplete PARG KO cells are transfected with additional gRNAs in the presence of PARPi. Furthermore, the authors show that the sensitivity of PARG KO cells to PARGi is not caused by NAD depletion but by S-phase accumulation of pADPR on chromatin coming from unligated Okazaki fragments, which are recognized and bound by PARP1. Consistently, PARG KO or PARG inhibition shows synthetic lethality with Pol beta, which is required for Okazaki fragment maturation. PARG expression levels in ovarian cancer cell lines correlate negatively with their sensitivity to PARGi.

      Thank you for your nice comments. The complete loss of PARG activity was observed in PARG complete/conditional KO (cKO) cells. These cKO clones were generated using wild-type cells transfected with sgRNAs targeting the catalytic domain of PARG in the presence of PARP inhibitor.

      Strengths:

      The authors show that PARG is essential for removing ADP-ribosylation in S-phase.

      Thanks!

      Weaknesses:

      1) This begs the question as to the relevant substrates of PARG in S-phase, which could be addressed, for example, by analysing PARylated proteins associated with replication forks in PARG-depleted cells (EdU pulldown and Af1521 enrichment followed by mass spectrometry).

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      2) The results showing the generation of a full PARG KO should be moved to the beginning of the Results section, right after the first Results chapter (PARG depletion leads to drastic sensitivity to PARGi), otherwise, the reader is left to wonder how PARG KO cells can be sensitive to PARGi when there should be presumably no PARG present.

      Thank you for your suggestion! However, we would like to keep the complete PARG KO result at the end of the Results section, since this was how this project evolved. Initially, we did not know that PARG is an essential gene. Thus, we speculated that PARGi may target not only PARG but also a second target, which only becomes essential in the absence of PARG. To test this possibility, we performed FACS-based and cell survival-based whole-genome CRISPR screens (Fig. 5). However, this putative second target was not revealed by our CRISPR screening data (Fig. 5). We then tested the possibility that these cells may have residual PARG expression or activity and only cells with very low PARG expression are sensitive to PARGi, which turned out to be the case for ovarian cancer cells. Equipped with PARP inhibitor and sgRNAs targeting the catalytic domain of PARG, we finally generated cells with complete loss of PARG activity to prove that PARG is an essential gene (Fig. 7). This series of experiments underscore the challenge of validating any KO cell lines, i.e. the identification of frame-shift mutations, absence of full-length proteins, and phenotypic changes may still not be sufficient to validate KO clones. This is an important lesson we learned and we would like to share it with the scientific community.

      To avoid further misunderstanding, we will include additional statements/comments at the end of “PARG depletion leads to drastic sensitivity to PARGi” section and at the beginning of “CRISPR screens reveal genes responsible for regulating pADPr signaling and/or cell lethality in WT and PARG KO cells”. Hope that our revised manuscript will make it clear.

      3) Please indicate in the first figure which isoforms were targeted with gRNAs, given that there are 5 PARG isoforms. You should also highlight that the PARG antibody only recognizes the largest isoform, which is clearly absent in your PARG KO, but other isoforms may still be produced, depending on where the cleavage sites were located.

      The sgRNAs used to generate PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), while isoforms 4 and 5 are considered catalytically inactive according to the Uniprot database. As suggested, we will modify Fig. S1D and the figure legends.

      The manufacturer instruction states that the Anti-PARG antibody (66564S) can only recognize isoform 1, this antibody could recognize isoforms 2 and 3 albeit weakly based on Western blot results with lysates prepared from PARG cKO cells reconstituted with different PARG isoforms, as shown below. As suggested, we will add a statement in the revised manuscript and provide the Western blotting data in Author response image 2.

      Author response image 2.

      To test whether other isoforms were expressed in 293A and/or HeLa cells, we used two independent antibodies that recognize the C-terminus of PARG for WB as shown in Author response image 3. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoforms or truncated forms were expressed in our PARG KO cells.

      Author response image 3.

      4) FACS data need to be quantified. Scatter plots can be moved to Supplementary while quantification histograms with statistical analysis should be placed in the main figures.

      We agree with this reviewer that quantification of FACS data may provide straightforward results in some of our data. However, it is challenging to quantify positive S phase pADPr signaling in some panels, for example in Fig. 3A and Fig. 4C. In both panels, pADPr signaling was detected throughout the cell cycle and therefore it is difficult to know the percentage of S phase pADPr signaling in these samples. Thus, we decide to keep the scatter plots to demonstrate the dramatic and S phase-specific pADPr signaling in PARG KO cells treated with PARGi. We hope that these data are clear and convincing even without any quantification.

      5) All colony formation assays should be quantified and sensitivity plots should be shown next to example plates.

      As suggested, we will include the sensitivity plot next to Fig. 3D. However, other colony formation assays in this study were performed with a single concentration of inhibitor and therefore we will not provide sensitivity plots for these experiments. Nevertheless, the results of these experiments are straightforward and easy to interpret.

      6) Please indicate how many times each experiment was performed independently and include statistical analysis.

      As suggested, we will add this information in the revised manuscript.

      Reviewer #3 (Public Review):

      Here the authors carried out a CRISPR/sgRNA screen with a DDR gene-targeted mini-library in HEK293A cells looking for genes whose loss increased sensitivity to treatment with the PARG inhibitor, PDD00017273 (PARGi). Surprisingly they found that PARG itself, which encodes the cellular poly(ADP-ribose) glycohydrolase (dePARylation) enzyme, was a major hit. Targeted PARG KO in 293A and HeLa cells also caused high sensitivity to PARGi. When PARG KO cells were reconstituted with catalytically-dead PARG, MMS treatment caused an increase in PARylation, not observed when cells were reconstituted with WT PARG or when the PARG KO was combined with PARP1/2 DKO, suggesting that loss of PARG leads to a strong PARP1/2-dependent increase in protein PARylation. The decrease in intracellular NADH+, the substrate for PARP-driven PARylation, observed in PARG KO cells was reversed by treatment with NMN or NAM, and this treatment partially rescued the PARG KO cell lethality. However, since NAD+ depletion with the FK868 nicotinamide phosphoribosyltransferase (NAMPT) inhibitor did not induce a similar lethality the authors concluded that NAD+ depletion/reduction was only partially responsible for the PARGi toxicity. Interestingly, PARylation was also observed in untreated PARG KO cells, specifically in S phase, without a significant rise in γH2AX signals. Using cells synchronized at G1/S by double thymidine blockade and release, they showed that entry into S phase was necessary for PARGi to induce PARylation in PARG KO cells. They found an increased association of PARP1 with a chromatin fraction in PARG KO cells independent of PARGi treatment, and suggested that PARP1 trapping on chromatin might account in part for the increased PARGi sensitivity. They also showed that prolonged PARGi treatment of PARG KO cells caused S phase accumulation of pADPr eventually leading to DNA damage, as evidenced by increased anti-γH2AX antibody signals and alkaline comet assays. Based on the use of emetine, they deduced that this response could be caused by unligated Okazaki fragments. Next, they carried out FACS-based CRISPR screens to identify genes that might be involved in cell lethality in WT and PARG KO cells, finding that loss of base excision repair (BER) and DNA repair genes led to increased PARylation and PARGi sensitivity, whereas loss of PARP1 had the opposite effects. They also found that BER pathway disruption exhibited synthetic lethality with PARGi treatment in both PARG KO cells and WT cells, and that loss of genes involved in Okazaki fragment ligation induced S phase pADPr signaling. In a panel of human ovarian cancer cell lines, PARGi sensitivity was found to correlate with low levels of PARG mRNA, and they showed that the PARGi sensitivity of cells could be reduced by PARPi treatment. Finally, they addressed the conundrum of why PARG KO cells should be sensitive to a specific PARG inhibitor if there is no PARG to inhibit and found that the PARG KO cells had significant residual PARG activity when measured in a lysate activity assay, which could be inhibited by PARGi, although the inhabited PARG activity levels remained higher than those of PARG cKO cells (see below). This led them to generate new, more complete PARG KO cells they called complete/conditional KO (cKO), whose survival required the inclusion of the olaparib PARPi in the growth medium. These PARG cKO cells exhibited extremely low levels of PARG activity in vitro, consistent with a true PARG KO phenotype.

      We thank this reviewer for his/her constructive comments and suggestions.

      The finding that human ovarian cancer cells with low levels of PARG are more sensitive to inhibition with a small molecule PARG inhibitor, presumably due to the accumulation of high levels of protein PARylation (pADPr) that are toxic to cells is quite interesting, and this could be useful in the future as a diagnostic marker for preselection of ovarian cancer patients for treatment with a PARG inhibitor drug. The finding that loss of base excision repair (BER) and DNA repair genes led to increased PARylation and PARGi sensitivity is in keeping with the conclusion that PARG activity is essential for cell fitness, because it prevents excessive protein PARylation. The observation that increased PARylation can be detected in an unperturbed S phase in PARG KO cells is also of interest. However, the functional importance of protein PARylation at the replication fork in the normal cell cycle was not fully investigated, and none of the key PARylation targets for PARG required for S phase progression were identified. Overall, there are some interesting findings in the paper, but their impact is significantly lessened by the confusing way in which the paper has been organized and written, and this needs to be rectified.

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      As suggested, we will revise our manuscript accordingly and provide additional explanation/statement upfront to avoid any misunderstandings.

    1. Author Response:

      Reviewer #1 (Public review):

      The authors of this study use electron microscopy and 3D reconstruction techniques to study the morphology of distinct classes of Drosophila sensory neurons *across many neurons of the same class.* This is a comprehensive study attempting to look at nearly all the sensory neurons across multiple sensilla to determine a) how much morphological variability exists between and within neurons of different and similar sensory classes, and 2) identify dendritic features that may have evolved to support particular sensory functions. This study builds upon the authors' previous work, which allowed them to identify and distinguish sensory neuron subtypes in the EM volumes without additional staining so that reconstructed neurons could reliably be placed in the appropriate class. This work is unique in looking at a large number of individual neurons of the same class to determine what is consistent and what is variable about their class-specific morphologies.

      This means that in addition to providing specific structural information about these particular cells, the authors explore broader questions of how much morphological diversity exists between sensory neurons of the same class and how different dendritic morphologies might affect sensory and physiological properties of neurons.

      The authors found that CO2-sensing neurons have an unusual, sheet-like morphology in contrast to the thin branches of odor-sensing neurons. They show that this morphology greatly increases the surface area to volume ratio above what could be achieved by modest branching of thin dendrites, and posit that this might be important for their sensory function, though this was not directly tested in their study. The study is mainly descriptive in nature, but thorough, and provides a nice jumping-off point for future functional studies. One interesting future analysis could be to examine all four cell types within a single sensilla together to see if there are any general correlations that could reveal insights about how morphology is determined and the relative contributions of intrinsic mechanisms vs interactions with neighboring cells. For example, if higher than average branching in one cell type correlated with higher than average branching in another type, if in the same sensilla. This might suggest higher extracellular growth or branching cues within a sensilla. Conversely, if higher branching in one cell type consistently leads to reduced length or branching in another, this might point to dendrite-dendrite interactions between cells undergoing competitive or repulsive interactions to define territories within each sensilla as a major determinant of the variability.

      We thank the reviewer for the insightful comments and appreciation for our study.

      Reviewer #2 (Public review):

      Summary:

      The manuscript employs serial block‐face electron microscopy (SBEM) and cryofixation to obtain high‐resolution, three‐dimensional reconstructions of Drosophila antennal sensilla containing olfactory receptor neurons (ORNs) that detect CO2. This method has been used previously by the same lab in Gonzales et. al, 2021. (https://elifesciences.org/articles/69896), which had provided an exemplary model by integrating high-resolution EM with electrophysiology and cell-type-specific labeling.

      We thank the reviewer for expressing appreciation for our published study.

      The previous study ended up correlating morphology with activity for multiple olfactory sensillar types. Compared to the 2021 study, this current manuscript appears somewhat incomplete and lacks integration with activity.

      We thank the reviewer for their feedback. However, we would like to clarify that our previous study did not correlate morphology with activity to a greater extent than the current study. Both employed the same cryofixation, SBEM-based approach without recording odor-induced activity, but the focus of the current work is fundamentally different. While the previous study examined multiple sensillum types, the current study concentrates on a single sensillum type to address a distinct biological question regarding morphological heterogeneity. We appreciate the opportunity to clarify this distinction, and we hope that the revised manuscript more clearly conveys the unique scope and contributions of this study.

      In fact older studies have also reported two-dimensional TEM images of the putative CO2 neuron in Drosophila (Shanbhag et al., 1999) and in mosquitoes (McIver and Siemicki, 1975; Lu et al, 2007), and in these instances reported that the dendritic architecture of the CO2 neuron was somewhat different (circular and flattened, lamellated) from other olfactory neurons.

      We thank the reviewer for pointing this out. As noted in both the Introduction and Discussion sections, previous studies—including those cited by the reviewer—suggested that CO2-sensing neurons may have a distinct dendritic morphology. However, those earlier studies lacked the means to definitively link the observed morphology to CO2 neuron identity.

      In contrast, our study assigns neuronal identity based on quantitative morphometric measurements, allowing us to confidently associate the unique dendritic architecture with CO2 neurons. Furthermore, we extend previous observations by providing full 3D reconstructions and nanoscale morphometric analyses, offering a much more comprehensive and definitive characterization of these neurons. We believe this represents a significant advancement over earlier work.

      The authors claim that this approach offers an artifact‐minimized ultrastructural dataset compared to earlier. In this study, not only do they confirm this different morphology but also classify it into distinct subtypes (loosely curled, fully curled, split, and mixed). This detailed morphological categorization was not provided in prior studies (e.g., Shanbhag et al., 1999 ).

      We thank the reviewer for acknowledging the significance of our study.

      The authors would benefit from providing quantitative thresholds or objective metrics to improve reproducibility and to clarify whether these structural distinctions correlate with distinct functional roles.

      We thank the reviewer for raising this point. However, we would like to clarify that assigning neurons to strict morphological subtypes was not the primary aim of our study. In practice, dendritic architectures can be highly complex, with individual neurons often displaying features characteristic of multiple subtypes. This is precisely why we included a “mixed” subtype category—to acknowledge and capture this morphological heterogeneity rather than impose rigid classification boundaries.

      Our intent in defining subtypes was not to imply discrete functional classes, but rather to highlight the range of morphological variation observed across ab1C neurons. While we agree that exploring potential correlations between structure and function is an important future direction, the current study focuses on characterizing this diversity using 3D reconstruction and morphometric analysis. We hope this clarifies the purpose and scope of our morphological categorization.

      Strengths:

      The study makes a convincing case that ab1C neurons exhibit a unique, flattened dendritic morphology unlike the cylindrical dendrites found in ab1D neurons. This observation extends previous qualitative TEM findings by not only confirming the presence of flattened lamellae in CO₂ neurons but also quantifying key morphometrics such as dendritic length, surface area, and volume, and calculating surface area-to-volume ratios. The enhanced ratios observed in the flattened segments are speculated to be linked to potential advantages in receptor distribution (e.g., Gr21a/Gr63a) and efficient signal propagation.

      We thank the reviewer for appreciating the significance our current study.

      Weaknesses:

      While the manuscript offers valuable ultrastructural insights and reveals previously unappreciated heterogeneity among CO₂-sensing neurons, several issues warrant further investigation in addition to the points made above.

      (1) Although this quantitative approach is robust compared to earlier descriptive reports, its impact is somewhat limited by the absence of direct electrophysiological data to confirm that ultrastructural differences translate into altered neuronal function. A direct comparison or discussion of how the present findings align with the functional data obtained from electrophysiology would strengthen the overall argument.

      We thank the reviewer for this comment. We would like to clarify, however, that our study does not claim that the observed morphological heterogeneity necessarily leads to functional diversity. Rather, we consider this as a possible implication and discuss it as a potential question for future research. This idea is raised only in the Discussion section, and we are carefully not to present functional diversity as a conclusion of our study. Nonetheless, we have reviewed the relevant paragraph to ensure the language remains cautious and does not overstate our interpretation.

      We also acknowledge the significance of directly linking ultrastructural features to neuronal function through electrophysiological recordings. However, at present, it is technically challenging to correlate the nanoscale morphology of individual ORNs with their functional activity, as this would require volume EM imaging of the very same neurons that were recorded via electrophysiology. Currently, there is no dye-labeling method compatible with single-sensillum recording and SBEM sample preparation that allows for unambiguous identification and segmentation of recorded ORNs at the necessary ultrastructural resolution.

      To acknowledge this important limitation, we have added a paragraph in the Discussion section, as suggested, to clarify the current technical barriers and to highlight this as a promising direction for future methodological advances.

      (2) Clarifying the criteria for dendritic subtype classification with quantitative parameters would enhance reproducibility and interpretability. Moreover, incorporating electrophysiological recordings from ab1C neurons would provide compelling evidence linking structure and function, and mapping key receptor proteins through immunolabeling could directly correlate receptor distribution with the observed morphological diversity.

      Please see our response to the comment regarding the technical limitations of directly correlating ultrastructure with electrophysiological data.

      In addition, we would like to address the suggestion of using immunolabeling to map receptor distribution in relation to the 3D EM models. Currently, antibodies against Gr21a or Gr63a (the receptors expressed in ab1C neurons) are not available. Even if such antibodies were available, immunogold labeling for electron microscopy requires harsh detergent treatment to increase antibody permeability, damaging morphological integrity. These treatments would compromise the very morphological detail that our study aims to capture and quantify.

      (3) Even though Cryofixation is claimed to be superior to chemical fixation for generating fewer artifacts, authors need to confirm independently the variation observed in the CO2 neuron morphologies across populations. All types of fixation in TEMs cause some artifacts, as does serial sectioning. Without understanding the error rates or without independent validation with another method, it is hard to have confidence in the conclusions drawn by the authors of the paper.

      We thank the reviewer for raising concerns regarding potential artifacts in morphological analyses. However, we would like to clarify that cryofixation is widely regarded as a gold standard for ultrastructural preservation and minimizing fixation-induced artifacts, as supported by extensive literature. This is why we adopted high-pressure freezing and freeze substitution in our study.

      We have also published a separate methods paper (Tsang et al., eLife, 2018) directly comparing our cryofixation-based protocol with conventional chemical fixation, demonstrating substantial improvements in morphological preservation (see the image below, adapted from Figure 2 of our 2018 eLife paper). This provides strong empirical support for the reliability of our approach.

      Author response image 1.

      Regarding the suggestion to validate observed morphological variation across populations: we note that determining the presence of artifacts requires a known ground truth, which is inherently unavailable as we could not measure the morphometrics of fly olfactory receptor neurons in their native state. In the absence of such a benchmark, we have instead prioritized using the best-available preparation methods and high-resolution imaging to ensure structural integrity.

      Addressing these concerns and integrating additional experiments would significantly bolster the manuscript's completeness and advancement.

      We appreciate the reviewer’s feedback. As discussed in our responses to the specific comments above, certain suggested experiments are currently limited by technical constraints, particularly in the context of high-resolution volume EM for insect tissues enclosed in cuticles.

      Nevertheless, we have carefully addressed the reviewer’s concerns to the fullest extent possible within the scope of this study. We have revised the manuscript to clarify methodological limitations, added new explanatory content where appropriate, and ensured that our interpretations remain well grounded in the data. We hope these revisions strengthen the clarity and completeness of the manuscript.

      Reviewer #3 (Public review):

      Summary:

      In the current manuscript entitled "Population-level morphological analysis of paired CO2- and odor-sensing olfactory neurons in D. melanogaster via volume electron microscopy", Choy, Charara et al. use volume electron microscopy and neuron reconstruction to compare the dendritic morphology of ab1C and ab1D neurons of the Drosophila basiconic ab1 sensillum. They aim to investigate the degree of dendritic heterogeneity within a functional class of neurons using ab1C and ab1D, which they can identify due to the unique feature of ab1 sensilla to house four neurons and the stereotypic location on the third antennal segment. This is a great use of volumetric electron imaging and neuron reconstruction to sample a population of neurons of the same type. Their data convincingly shows that there is dendritic heterogeneity in both investigated populations, and their sample size is sufficient to strongly support this observation. This data proposes that the phenomenon of dendritic heterogenity is common in the Drosophila olfactory system and will stimulate future investigations into the developmental origin, functional implications, and potential adaptive advantage of this feature.

      Moreover, the authors discovered that there is a difference between CO2- and odour-sensing neurons of which the first show a characteristic flattened and sheet-like structure not observed in other sensory neurons sampled in this and previous studies. They hypothesize that this unique dendritic organization, which increases the surface area to volume ratio, might allow more efficient Co2 sensing by housing higher numbers of Co2 receptors. This is supported by previous attempts to express Co2 sensors in olfactory sensory neurons, which lack this dendritic morphology, resulting in lower Co2 sensitivity compared to endogenous neurons.

      Overall, this detailed morphological description of olfactory sensory neurons' dendrites convincingly shows heterogeneity in two neuron classes with potential functional impacts for odour sensing.

      Strength:

      The volumetric EM imaging and reconstruction approach offers unprecedented details in single cell morphology and compares dendrite heterogeneity across a great fraction of ab1 sensilla.<br /> The authors identify specific shapes for ab1C sensilla potentially linked to their unique function in CO2 sensing.

      We thank the reviewer for the insightful comments and appreciation for our study.

      Weaknesses:

      While the morphological description is highly detailed, no attempts are made to link this to odour sensitivity or other properties of the neurons. It would have been exciting to see how altered morphology impacts physiology in these olfactory sensory cells.

      We agree that linking morphological variation to physiological properties, such as odor sensitivity, would be a highly valuable direction for future research. However, the aim of the current study is to provide an in-depth nanoscale characterization based on a substantial proportion of ab1 sensilla, highlighting morphological heterogeneity among homotypic ORNs.

      At present, it is technically challenging to correlate the nanoscale morphology of individual ORNs with their physiological responses, as this would require volume EM imaging of the exact neurons recorded via single-sensillum electrophysiology. Currently, no dye-labeling method exists that is compatible with both single-sensillum recording and the stringent requirements of SBEM sample preparation to allow for unambiguous identification and segmentation of recorded ORNs.

      To acknowledge this important limitation, we have added a paragraph in the Discussion section clarifying the current technical barriers and highlighting this as a promising area for future methodological development. Please also see our responses to the reviewer’s 4th comment below, where we present preliminary experiments examining whether odor sensitivity varies among homotypic ORNs.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors showed that enalapril was able to reduce cellular senescence and improve health status in aged mice. The authors further showed that phosphorylated Smad1/5/9 was significantly elevated and blocking this pathway attenuated the protection of cells from senescence. When middle-aged mice were treated with enalapril, the physiological performance in several tissues, including memory capacity, renal function, and muscle strength, exhibited significant improvement.

      Strengths:

      The strength of the study lies in the identification of the pSMAD1/5/9 pathway as the underlying mechanism mediating the anti-senescence effects of enalapril with comprehensive evaluation both in vitro and in vivo.

      Thanks very much for your insightful evaluation and the constructive suggestions. We have thoroughly studied the comments and a provisional point-to-point response is shown as follows.

      Weaknesses:

      The major weakness of the study is the in vivo data. Despite the evidence shown in the in vitro study, there is no data to show that blocking the pSmad1/5/9 pathway is able to attenuate the anti-aging effects of enalapril in the mice. In addition, the aging phenotypes mitigation by enalapril is not evidenced by the extension of lifespan.

      Thanks for your comment. As suggested, we will feed LDN193189 to mice while using LDN193189 to block pSmad1/5/9, and will assess age-related phenotypes in the mice to demonstrate that the anti-aging effect of enalapril in mice is mediated through pSmad1/5/9.

      We only assess the improvement in the health status of the aging mice, which indicate that enalapril can extend the healthy lifespan of aging mice. This is because we believe that lifespan is controlled by genetics. Therefore, this study focuses solely on the improvement of health phenotypes in aging mice by enalapril.

      If it is necessary to show that NAC is able to attenuate enalapril effects in the aging mice. In addition, it would be beneficial to test if enalapril is able to achieve similar rescue in a premature aging mouse model.

      Thanks for your suggestion. To our knowledge, NAC is an inhibitor of ROS, which is consistent with the antioxidant effect of enalapril. Therefore, we believe that NAC will not diminish the effect of enalapril.

      For the premature aging mouse models, we examined the effect of enalapril on Lmna<sup>G609G</sup> mice and other premature aging models and found that the effect was relatively modest. This may be due to differences in the genetic background of premature aging mice, leading to a less pronounced effect of enalapril compared to its impact on naturally aged mice.

      Reviewer #2 (Public review):

      This manuscript presents an interesting study of enalapril for its potential impact on senescence through the activation of Smad1/5/9 signaling with a focus on antioxidative gene expression. Repurposing enalapril in this context provides a fresh perspective on its effects beyond blood pressure regulation. The authors make a strong case for the importance of Smad1/5/9 in this process, and the inclusion of both in vitro and in vivo models adds value to the findings. Below, I have a few comments and suggestions which may help improve the manuscript.

      Thanks very much for your insightful evaluation and the constructive suggestions. We have thoroughly studied the comments and a provisional point-to-point response is shown as follows.

      A major finding in the study is that phosphorylated Smad1/5/9 mediates the effects of enalapril. However, the manuscript focused on the Smad pathway relatively abruptly, and the rationale behind targeting this specific pathway is not fully explained. What makes Smad1/5/9 particularly relevant to the context of this study?

      Thanks for your comment. As stated in the manuscript, after we found that enalapril could improve the cellular senescence phenotype, we screened and examined key targets in important aging-related signaling pathways, such as AKT, mTOR, ERK (Fig. S2A), Smad2/3 and Smad1/5/9 (Fig. 2A). We found that only the phosphorylation levels of Smad1/5/9 significantly increased after enalapril treatment. Therefore, the subsequent focus of this study is on pSmad1/5/9.

      Furthermore, their finding that activation of Smad1/5/9 leads to a reduction of senescence appears somewhat contradictory to the established literature on Smad1/5/9 in senescence. For instance, studies have shown that BMP4-induced senescence involves the activation of Smad1/5/8 (Smad1/5/9), leading to the upregulation of senescence markers like p16 and p21 (JBC, 2009, 284, 12153). Similarly, phosphorylated Smad1/5/8 has been shown to promote and maintain senescence in Ras-activated cells (PLOS Genetics, 2011, 7, e1002359). Could the authors provide more detailed mechanistic insights into why enalapril seems to reverse the typical pro-senescent role of Smad1/5/9 in their study?

      Thanks for your comment. The downstream regulatory network of BMP-pSmad1/5/9 is highly complex. The BMP-SMAD-ID axis has been mentioned in many studies, and its downstream signaling inhibits the expression of p16 and p21 (PNAS, 2016, 113(46), 13057-13062; Cell, 2003, 115(3), 281-292). Additionally, studies have also found that the Smad1-Stat1-P21 axis inhibits osteoblast senescence (Cell Death Discovery, 2022, 8:254). In our study, enalapril was found to increase the expression of ID1, which is a classic downstream target of pSmad1/5/9 (Cell Stem Cell, 2014, 15(5), 619-633). Therefore, pSmad1/5/9 inhibits cellular senescence markers such as p16, p21 and SASP through ID1, thereby promoting cell proliferation (Fig. 3). Furthermore, we also found that pSmad1/5/9 increases the expression of antioxidant genes and reduces ROS levels, exerting antioxidant effects (Fig. 4). Together, ID1 and antioxidant genes enable pSmad1/5/9 to exert its anti-aging effects.

      While the authors showed that enalapril increases pSmad1/5/9 phosphorylation, what are the expression levels of other key and related factors like Smad4, pSmad2, pSmad3, BMP2, and BMP4 in both senescent and non-senescent cells? These data will help clarify the broader signaling effects.

      Thanks for your suggestion. We observed an increase in Smad4 expression, while the levels of pSmad2 and pSmad3 remained unchanged after enalapril treatment (Fig. 2A). We will supplement data on the expression changes of these key factors in both senescent and non-senescent cells.

      They used BMP receptor inhibitor LDN193189 to pharmacologically inhibit BMP signaling, but it would be more convincing to also include genetic validation (e.g., knockdown or knockout of BMP2 or BMP4). This will help confirm that the observed effects are truly due to BMP-Smad signaling and not off-target effects of the pharmacological inhibitor LDN.

      Thanks for your suggestion. We will use shRNA or siRNA to knockdown BMP and examine the related changes to clarify the role of BMP-Smad signaling.

      I don't see the results on the changes in senescence markers p16 and p21 in the mouse models treated with enalapril. Similarly, the effects of enalapril treatment on some key SASP factors, such as TNF-α, MCP-1, IL-1β, and IL-1α, are missing, particularly in serum and tissues. These are important data to evaluate the effect of enalapril on senescence.

      Thanks for your comment. As for the markers p16 and p21, we observed no change in p16, while the changes in p21 varied across different organs and tissues. (Author response image 1). Nevertheless, behavioral experiments and physiological and biochemical indicators at the individual level consistently demonstrated the significant anti-aging effects of enalapril (Fig. 6).

      Author response image 1.

      p21(Cdkn1a) expression levels in organs of mice after enalapril feeding.

      We also examined the changes in SASP factors in the serum of mice after enalapril treatment. Notably, SASP factors such as CCL (MCP), CXCL and TNFRS11B showed significant decreases (Fig. 5C). The expression changes of SASP factors varied across different organs. In the liver, kidneys and spleen, the expression of IL1a and IL1b decreased, while TNFRS11B expression decreased in both the liver and muscles (Fig. 5B). Additionally, CCL (MCP) levels decreased in all organs (Fig. 5B).

      Given that enalapril is primarily known as an antihypertensive, it would be helpful to include data on how it affects blood pressure in the aged mouse models, such as systolic and diastolic blood pressure. This will clarify whether the observed effects are independent of or influenced by changes in blood pressure.

      Thanks for your comment. We measured the blood pressure in mice, and found no significant change in blood pressure after enalapril treatment, which has also been validated in other studies (J Gerontol A Biol Sci Med Sci, 2019, 74(8), 1149–1157). Therefore, our results are independent of changes in blood pressure.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Through a series of CRISPR-Cas9 screens, the GPX4 antioxidant pathway was identified as a critical suppressor of cold-induced cell death in hibernator-derived cells. Hamster BHK-21 cells exposed to repeated cold and rewarming cycles revealed five genes (Gpx4, Eefsec, Pstk, Secisbp2, and Sepsecs) as critical components of the GPX4 pathway, which protects against cold-induced ferroptosis. A second screen with continuous cold exposure confirmed the essential role of GPX4 in prolonged cold tolerance. GPX4 knockout lines exhibited complete cell death within four days of cold exposure, and pharmacological inhibition of GPX4 further increased cell death, underscoring the necessity of GPX4's catalytic activity in cold conditions.

      An additional CRISPR screen in human cold-sensitive K562 cells identified 176 genes for cold survival. The GPX4 pathway was found to confer significant resistance to cold in hibernators and human cells, with GPX4 loss significantly increasing cold-induced cell death.

      Comparing hamster and human GPX4, overexpression of GPX4 in human K562 cells, whether hamster or human GPX4, dramatically improved cold tolerance, while catalytically dead mutants showed no such effect. These findings suggest that GPX4 abundance is a key limiting factor for cold tolerance in human cells, and primary cell types show strong sensitivity to GPX4 loss, highlighting that differences in cold tolerance across species may be due to varying GPX4-mediated protection.

      Strengths:

      (1) Innovative Approach: The study employs a series of unbiased genome-wide CRISPR-Cas9 screens in both hibernator- and non-hibernator-derived cells to investigate the mechanisms controlling cellular cold tolerance. Notably, this is the first genome-scale CRISPR-Cas9 screen conducted in cells derived from a hibernator, the Syrian hamster.

      (2) Identification of the GPX4 Pathway: Identifying glutathione peroxidase 4 (GPX4) as a critical suppressor of cold-induced cell death significantly contributes to the field. Recently, GPX4 was also reported as a potent regulator of cold tolerance through overexpression screening (Sone et al.) in hamsters, which further supports this finding.

      (3) Improved Cold Viability Assessment: The study identifies an important technical artifact in using trypan blue to assess cell viability following cold exposure. It reveals that cells stained immediately after cold exposure retain the dye, inaccurately indicating cell death. By introducing a brief rewarming period before viability assessment, the authors significantly improve the accuracy of detecting cold-induced cell death. This refinement in methodology ensures more reliable results and sets a new standard for future research on cold stress in cells.

      Weaknesses:

      (1) Mechanisms Regulating GPX4 Levels: While the study highlights GPX4 levels as a major determinant of cellular cold tolerance, it does not discuss how these levels are regulated or why they differ between hibernators and non-hibernators. This omission leaves an important aspect of GPX4's role in cold tolerance unexplored.

      (2) Generalizability Across Species: Although the study demonstrates the role of GPX4 in several mammalian species, it does not investigate whether this mechanism extends to other vertebrates (e.g., fish and amphibians) that also face cold challenges. This limitation could restrict the broader evolutionary claims made by the study.

      (3) Variability in Cold Sensitivity Across Human Cell Lines: The study observes significant variability in cold tolerance among different human cell lines but does not explain these differences clearly. This leaves a key aspect of human cell cold sensitivity insufficiently addressed.

      We thank the reviewer for the positive evaluation and thoughtful comments on the manuscript. We acknowledge that our study does not delve into the mechanisms regulating GPX4 levels, including differences between hibernators and non-hibernators, differences between cell types, or the possibility that GPX4 levels are dynamically regulated by environmental conditions. We consider these as interesting open questions that could be addressed in future studies.

      While our study focused entirely on mammalian species, we agree that examining cold tolerance mechanisms across a broader range of vertebrates, including fish and amphibians, could enhance our evolutionary perspective. Interestingly, previous work has indicated that C.elegans adapt to cold temperatures through ferritin mediated Fe2+ detoxification. This suggests that cold induces Fe2+-mediated toxicity in C.elegans as well as mammalian cells, but that the mechanisms through which distantly related species counteract cold-mediated cell death may vary. 

      Finally, we agree that the variability in cold sensitivity across human cell lines could be further explored, and we will strongly consider conducting follow up experiments to examine the extent to which this variability is driven by levels of GPX4.

      We are grateful for these insightful comments, as they highlight important avenues for future research. Addressing these questions will enable a more comprehensive understanding of GPX4's role in cold tolerance and its evolutionary significance across diverse organisms.

      Reviewer #2 (Public review):

      Summary:

      Lam et al., present a very intriguing whole genome CRISPR screen in Syrian Hamster cells as well as K562 cells to identify key genes involved in hypothermia-rewarming tolerance. Survival screens were performed by exposing cells to 4C in a cooled CO2 incubator followed by a rewarming period of 30 minutes prior to survival analysis. In this paradigm, Syrian hamster-derived cell lines exhibit more robust survival than human cell lines (BHK-21 and HaK vs HT1080, HeLa, RPE1, and K562). A genome-wide Syrian hamster CRISPR library was created targeting all annotated genes with 10 guides/gene. LV transduction of the library was performed in BHK-21 cells and the survival screen procedures involved 3 cycles of 4C cold exposure x4 days followed by 2 days of re-warming.

      When compared to controls maintained at 37C, 9 genes were required for BHK-21 survival of cold cycling conditions and 5 of these 9 are known components of the GPX4 antioxidant pathway. GPX4 KO BHK-21 cells had reduced cell growth at 37C and profoundly worse cold tolerance which could be reduced by GPX4 expression. GPX4 inhibitors also reduced survival in cold. CRISPR KO screens and GPX4 KO in K562 cells revealed comparable results (though intriguingly glutathione biosynthesis genes were more critical to K562 cells than BHK-21 cells). Human or Syrian hamster GPX4 overexpression improved cold tolerance.

      Strengths:

      This is a very nicely written paper that clearly communicates in figures and text complicated experimental manipulations and in vitro genetic screening and cell survival data. The focus on GPX4 is interesting and relatively novel. The converging pharmacologic, loss-of-function, and gain-of-function experiments are also a strength.

      Weaknesses:

      A recently published article (Reference 43, Sone et al.) also independently explored the role of GPX4 in Syrian hamster cold tolerance through gain-of-function screening. Further exploration of the GPX4 species-specific mechanisms would be of great interest, but this is considered a minor weakness given the already very comprehensive and compelling data presented.

      We greatly appreciate the reviewer’s compliments and thoughtful comments on our manuscript. We agree with the reviewer that our approach (dual unbiased genome-scale screens in human and hamster cells) and the recent investigation by Sone et al (gain-of-function screening involving the insertion of hamster cDNA into human cells) mutually strengthen the importance of GPX4 in cold tolerance across cell types and species.

      Reviewer #3 (Public review):

      Summary:

      This work aims to address a fundamental biological question: how do mammalian cells achieve/lose tolerance to cold exposure? The authors first tried to establish an experimental system for cell cold exposure and evaluation of cell death and then performed genome-scale CRISPR-Cas9 screening on immortalized cell lines from Syrian Hamster (BHK-21) and human (K562) for key genes that are associated with cell survival during prolonged cold exposure. From these screenings, they focused on glutathione peroxidase 4 (GPX4). Using genetic modifications or pharmacological interventions, and multiple cell models including primary cells from various mammalian species, they showed that GPX4 proteins are likely to retain their activities at 4 {degree sign}C, functioning to prevent cold-induced cell ferroptosis.

      Strengths:

      (1) This paper is neatly written and hence easy to follow.

      (2) Experiments are well designed.

      (3) The data showing the overall good cell survival after a prolonged cold exposure or repeated cold-warm cycles are helpful to show the advantages of the experimental instruments and methods the authors used, and hence the validity of their results.

      (4) The CRISPR-Cas9 screening is a great attempt.

      (5) Multiple cell types from hibernating mammals (cold tolerant) and cold-intolerant species are used to test their findings.

      (6) Although some may argue that other labs have published works with different approaches that have pointed out the importance of GPX4 and ferroptosis in hamster cell survival from anoxia-reoxygenation or cold exposure models, hence hurting the novelty of this work, this reviewer thinks that it is highly valuable to have independent research groups and different methods/systems to validate an important concept.

      Weaknesses:

      (1) Only cell death was robustly surveyed; though cell proliferation was evaluated too in some experiments, other cellular functions, such as mitochondrial ATP production vs. glycolysis, and the extent of lipid peroxidation, could have been measured to reflect cellular physiology.

      Validations on complex tissues or in vivo systems would have further strengthened the work and its impact.

      CRISPR-Cas9 screening may have technical limitations as knock-out of some essential genes/pathways may lead to cell lethality during screening, and hence the relevance of these genes/pathways to cell cold tolerance may not be noted. From the data presented in this study, this reviewer thinks that the GPX4 pathway is likely a conserved mechanism for long-term cold survival, but not for cold sensitivity or acute cell death from cold exposure. In line with my such speculation, their CRISPR-Cas9 screening revealed genes in the GPX4 pathway from a relatively cold-sensitive human cell line, but the endogenous GPX4 pathway is seemingly operational in this cold-sensitive cell line. Also, these cells are viable after GPX4 knock-out. Dead cells from the acute cold exposure phase may detached, or their genomic DNAs have been severely damaged by the time of sample collection, hence not giving any meaningful sequencing reads. Crippling other factors/pathways such as FOXO1 (PMID: 38570500) or 5-aminolevulinic acid (ALA) metabolism (PMID: 35401816) have been shown to severely aggravate cold-induced cell death, including TUNEL-revealed DNA damage, within a much shorter time scale, whilst loss-function knockouts of FOXO1 or ALA Synthase 1 (ALAS1) are usually cell lethal. Thus, they and other possible essential genes may not be screenable from the current experimental protocol. These important points need to be taken into consideration by the authors.

      We thank the reviewer for highlighting the novelty of using genome-scale CRISPR-Cas9 screens and the validation of GPX4 function across cell types and mammalian species. 

      We acknowledge that our study primarily focused on measuring cell death using Trypan Blue dye exclusion. To validate the Trypan Blue assay, cell survival data was orthogonally measured using the LDH release assays (Fig. 1g). The proliferation potential of putatively live cells was assessed by counting the increase in live cells following 24 h at 37°C (Fig. 1b). Prompted by your question, we will add additional data to the final version of the manuscript in which we show that following 1 day at 4°C, K562 cells rapidly restarted their cell cycle and double in numbers every 21 hours (Author response image 1). This rate is indistinguishable from the replication rate of cells that were not previously exposed to 4°C, suggesting that the cells following cold exposure are both alive and functionally capable of replicating.

      Author response image 1.

      Population doubling time of K562 cells cultured at 37°C (pink) and cells that are rewarmed to 37°C following 1 day of 4°C exposure

      We agree that assessing additional cellular functions, such as mitochondrial ATP production, glycolysis, lipid metabolism and peroxidation could provide a more comprehensive understanding of cellular physiology under cold stress and would be valuable future studies. Similarly, we appreciate the suggestion to validate our findings in complex tissues or in vivo models. We recognize that such validation could strengthen the implications of our study and enhance its translational potential; however, due to their complexity, we believe that these additional studies are beyond the scope of our current study.

      We agree with the reviewer that CRISPR-Cas9 screens have limitations. For example our screen was designed to identify genes that are preferentially required for cellular fitness at 4°C versus 37°C. There are many genes that are required for cellular survival at 4°C as well as 37°C that are not discussed (Table S2, S5). Also, given that the screen is designed to disrupt a single gene per cell, genes that have redundant functions in cold-tolerance will likely be missed. Given the reviewer’s questions, we will expand the discussion of the paper to highlight limitations of the screen.

      We apologize for any lack of clarity about the methods we employed during the screen and will expand the methods section to provide further details. For example, for the BHK-21 screen we eliminated dead cells by sequencing cells that reattached after rewarming to 37°C for either 30 minutes (15 day cold exposure screen) or 24 hours (4°C cycling screen). Indeed, at the point of cell collection for both BHK-21 and K562 screens, the fraction of live cells was greater than 92% and 95%, respectively.  We respectfully disagree with the reviewer that our screens would miss genes that affect acute cold tolerance. Any cells that would have died either early or late during cold exposure would have not been sequenced, and thus the sgRNAs targeting a specific gene in those cells would appear depleted, regardless of whether these cells died early/acutely or later during cold exposure. 

      We thank the reviewer for pointing out two additionally highly relevant studies. Interestingly, the genes implicated in cold tolerance in these studies, FOXO1 and ALAS1, did not appear essential for survival at 37°C or 4°C  in BHK-21 or K562 cells. There are several possibilities that could explain this finding: 1) our screen may not have successfully knocked out these genes, 2) other proteins may have compensated for their loss, or 3) these pathways may regulate cold tolerance in some but not all cell types. We apologize that in the current version of the manuscript we did not reflect on these recent studies. We will expand our discussion to include their findings. 

      Once again, we are grateful for the reviewer’s insights, which have highlighted key areas for further exploration as well as pointed to specific ways to improve our manuscript.

    1. Author Response

      Joint Public Review

      The molecular composition of synaptic vesicles (SVs) has been defined in substantial detail, but the function of many SV-resident proteins are still unknown. The present study focused on one such protein, the 'orphan' SV-resident transporter SLC6A17. By utilizing sophisticated and extensive mouse genetics and behavioral experiments, the authors provide convincing support for the notion that certain SLC6A17 variants cause intellectual disability (ID) in humans carrying such genetic variations. This is an important and novel finding. Furthermore, the authors propose, based on LCMS analyses of isolated SVs, that SLC6A17 is responsible for glutamine (Gln) transport into SVs, leading to the provocative idea that Gln functions as a neurotransmitter and that deficits in Gln transport into SVs by SLC6A17 represents a key pathogenetic mechanism in human ID patients carrying variants of the SLC6A17 gene.

      This latter aspect of the present paper is not adequately supported by the experimental evidence so that the main conceptual claims of the study appear insufficiently justified at this juncture. Key weaknesses are as follows:

      A) Detection of Gln, along with classical neurotransmitters such as glutamate, GABA, or ACh, in isolated SV fractions does not prove that Gln is transported into SVs by active transport. Gln is quite abundant in extracellular compartments. Its appearance in SV samples can therefore also be explained by trapping in SVs during endocytosis, presence in other - contaminating - organelles, binding to membrane surfaces, and other processes. Direct assays of Gln uptake into SVs, which have the potential to stringently test key postulates of the authors, are lacking.

      We have conducted multiple control experiments to exclude the possibility of contamination.

      1). Western blot analysis of SLC6A17-HA immunoisolation (Figure 4D and Figure 4—figure supplement 1) has shown that this faction contained little other organelles and membranes. These results are strong argument that contaminations in our isolated fraction were in very low level.

      2). We then examined the proportion of SLC6A17 localized SVs through quantifying the co-localization of Syp and SLC6A17 by anti-Syp immunoisolation in Slc6a17-2A-HA-iCre mice. We found that SLC6A17 is predominately localized on SVs (with 98.7% compared with classical SV marker, Author response image 1A). This further showed that immunoisolated SLC6A17 fraction was mainly composed of SVs.

      3). We also analyzed other SV marker proteins such as Syt1 and Syb2 for IP-LC-MS, all results supported Gln enrichment (Author response image 1B).

      4). Importantly, immunoisolation of the SLC6A17P633R-HA protein, which caused SLC6A17 mislocalization away from the SVs (Figure 3B and Figure 3—figure supplement 1C, D), showed no Gln enrichment (Author response image 1C).

      5). Moreover, immunoisolation of AAV-PHP.eb overexpressed cytoplasmic membrane Gln transporter SLC38A1-HA did not show Gln enrichment (Author response image 1D).

      6). We also tested whether trafficking organelles such as the lysosome could enrich Gln. As is shown in Author response image 1E, immunoisolation of AAV-PHP.eb overexpressed TMEM192-HA did not show Gln enrichment. For active transport, we tested the effects of proton dissipator FCCP, v-ATPase inhibitor NEM and ΔpH dissipator nigercin. As is shown in Author response image 1F, 1G, Gln level was reduced by these inhibitors, supporting active transport of Gln.

      Author response image 1.

      Control experiments to test for contamination. A. Anti-Syp immunoisolation in Slc6a17-2A-HA-iCre mice. B. Quantification of Gln level in anti-Syt1 and anti-Syb2 immunoisolated fraction. C. Anti-HA immunoisolation in SLC6A7-2A-HA and anti-Slc6a17P633R mice. D. Anti-HA immunoisolation in AAV-PHP.eb-hSyn-SLC38A1-HA overexperssion mice. E. Anti-HA immunoisolation in AAV-PHP.eb-hSyn-TMEM192-HA overexperssion mice. F. Anti-HA immunoisolation in SLC6A7-2A-HA mice under FCCP (50 μM) and NEM (200 μM). G. Anti-Syp immunoisolation in wild type mice under FCCP (50 μM) and Nigercin (20 μM).

      B) The authors generated multiple potentially very useful genetic tools and models. However, the validation of these models is incomplete. Most importantly, it remains unclear whether the different mutations affect SLC6A17 expression levels, subcellular localization, or the expression and trafficking of other SV and synapse components.

      The verification of transgenic mouse line is described in the Material and Methods section of our manuscript. There are numerous literatures published for CRISPR mediated gene editing in animals and the off-target effect of CRISPR-Cas9 system is widely studied with optimized design tools developed by many groups (Platt et al., 2014; Chu et al., 2015, 2016; Liu et al., 2017; Gemberling et al., 2021; Singh et al., 2022). The gRNAs used for animal generation were chosen carefully based on publically available tools. Apart from basic genomic PCR sequencing of target regions of all gene edited mouse models, Southern blots were performed by Biocytogen company for Slc6a17-HA-2A-iCre and Slc6a17P633R mice to rule out random insertions. Expression levels in Slc6a17-KO and Slc6a17P633R mice were not affected, as shown in Figure R2. HA-tagged protein in Slc6a17-HA-2A-iCre and Slc6a17P633R mice were detected by immunoisolation, immunofluorescence, and fractionation (Figure 3, 4, Figure 3—figure supplement 1, Figure 4—figure supplement 1). Both showed localizations expected from previous reports ().

      C) Apart from the caveats mentioned above regarding Gln uptake into SVs, the data interpretation provided by the authors lacks stringency with respect to the biophysics of plasma membrane and SV transporters.

      The biophysics of SLC6A17 was carefully studied (Para et al 2008; Zaia and Reimer, 2009). Our work focused on in vivo biochemical results, not biophysics.

      Author response image 2.

      Verification of genetic mouse models. A. q-PCR verification of Slc6a17-KO mice; B. q-PCR verification of Slc6a17P633R mice; C. Example of genomic primer design for Slc6a17-HA-2A-iCre mice founder mice screen; D. Example of genomic PCR for Slc6a17-HA-2A-iCre mice founder mice screen; E. Southern blot performed for Slc6a17-HA-2A-iCre mice.

      Reference

      Chu, Van Trung et al. “Increasing the efficiency of homology-directed repair for CRISPR-Cas9-induced precise gene editing in mammalian cells.” Nature biotechnology vol. 33,5 (2015): 543-8. doi:10.1038/nbt.3198

      Chu, Van Trung, et al. "Efficient generation of Rosa26 knock-in mice using CRISPR/Cas9 in C57BL/6 zygotes." BMC biotechnology 16.1 (2016): 1-15.

      Gemberling, Matthew P et al. “Transgenic mice for in vivo epigenome editing with CRISPR-based systems.” Nature methods vol. 18,8 (2021): 965-974. doi:10.1038/s41592-021-01207-2

      Liu, Edison T., et al. "Of mice and CRISPR: The post‐CRISPR future of the mouse as a model system for the human condition." EMBO reports 18.2 (2017): 187-193.

      Madisen, Linda, et al. "A robust and high-throughput Cre reporting and characterization system for the whole mouse brain." Nature neuroscience 13.1 (2010): 133-140.

      Parra, Leonardo A., et al. "The orphan transporter Rxt1/NTT4 (SLC6A17) functions as a synaptic vesicle amino acid transporter selective for proline, glycine, leucine, and alanine." Molecular pharmacology 74.6 (2008): 15211532.

      Platt, R.J., Chen, S., Zhou, Y., Yim, M.J., Swiech, L., Kempton, H.R., Dahlman, J.E., Parnas, O., Eisenhaure, T.M., Jovanovic, M., et al. (2014). CRISPR-Cas9 knockin mice for genome editing and cancer mode Yang, Hui, Haoyi Wang, and Rudolf Jaenisch. "Generating genetically modified mice using CRISPR/Cas-mediated genome engineering." Nature protocols 9.8 (2014): 1956-1968.ling. Cell 159, 440-455.

      Singh, Surender et al. “Opportunities and challenges with CRISPR-Cas mediated homologous recombination based precise editing in plants and animals.” Plant molecular biology, 10.1007/s11103-022-01321-5. 31 Oct. 2022, doi:10.1007/s11103-022-01321-5

      Zaia, K.A., and Reimer, R.J. (2009). Synaptic vesicle protein NTT4/XT1 (SLC6A17) catalyzes Na+-coupled neutral amino acid transport. J Biol Chem 284, 8439-8448.

    1. Author response:

      We would like to thank the editors and the reviewers for constructive feedback on our first version of the manuscript. Before submitting a fully revised version with detailed response to each point, we would like to provide a brief clarification on some of the key issues.

      Reviewer 2 raised a concern about the precision and specificity of holographic stimulation, regarding its potential effect on out-of-focus stimulation points and planes. We further verified whether the laser power at the targeted z-plane influences cells’ activity at nearby z-planes. As the Reviewer pointed out, the previous x- and y-axis shifts were tested by single-cell stimulation. This time, we stimulated five cells simultaneously, to match the actual experiment setup and assess potential artifacts in other planes. We observed no stimulation-driven activity increase in cells at a z-planed shifted by 20 µm (Author response image 1). This confirms the holographic stimulation accurately manipulates the pre-selected target cells and the effects we observe is not likely due to out-of-focus stimulation artifacts. It is true that not all of pre-selected cells showing significant response changes prior to the main experiment are effectively activated t every trial during the experiments. While further analyses will be included in the revised manuscript, we varied the target cell distances across FOVs, from nearby cells to those farther apart within the FOV. We have not observed a significant relationship between the target cell distances and stimulation effect. Lastly, cells within < 15 µm of the target were excluded to prevent potential excitation due to the holographic stimulation power. Given the spontaneous movements of the FOV during imaging sessions due to animal’s movement, despite our efforts to minimize them, we believe that any excitation from these neighboring neurons would be directly from the stimulation rather than the light pattern artifact itself.

      Author response image 1.

      Stimulation effect on five pre-selected cells at the target z-plane (left) and 20 µm off-target z-plane (right). No stimulation-driven effect was observed on the off-target cells.

      Reviewers also raised concerns regarding the interpretation of homeostatic balance. While we are working on further analyses to strengthen our findings based on the reviewers’ suggestions, the observed response changes in co-tuned neuronal ensembles, specifically during the processing of their preferred frequency information, highlights an interaction between sensory processing and network dynamics. We believe this specificity indicates a functional mechanism beyond broad suppression or simple inhibitory effects, possibly aligning with homeostatic principles in cortical circuits. Regarding the post-stimulation effect, it is true neither the stimulation nor the control condition showed further response changes during the post-stimulation session. For the control condition, this is likely due to the repetitive tone presentation that could already triggered neural adaptation to a plateau by first two imaging sessions (baseline and stimulation sessions), preventing further changes in the last session. However, as the stimulation condition induced a greater amplitude decrease during the stimulation session compared to the control condition, if this extra suppression had not persisted during the post-stimulation session, we would have expected response amplitudes to rebound, increasing between the stimulation and post-stimulation sessions, which was not the case. Therefore, we propose that the persistence of this rebalanced network state is more indicative of a potential homeostatic mechanism in response to the activity manipulation within the network.

    1. Author response:

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility, and clarity):

      The work by Pinon et al describes the generation of a microvascular model to study Neisseria meningitidis interactions with blood vessels. The model uses a novel and relatively high throughput fabrication method that allows full control over the geometry of the vessels. The model is well characterized. The authors then study different aspects of Neisseriaendothelial interactions and benchmark the bacterial infection model against the best disease model available, a human skin xenograft mouse model, which is one of the great strengths of the paper. The authors show that Neisseria binds to the 3D model in a similar geometry that in the animal xenograft model, induces an increase in permeability short after bacterial perfusion, and induces endothelial cytoskeleton rearrangements. Finally, the authors show neutrophil recruitment to bacterial microcolonies and phagocytosis of Neisseria. The article is overall well written, and it is a great advancement in the bioengineering and sepsis infection field, and I only have a few major comments and some minor.

      Major comments:

      Infection-on-chip. I would recommend the authors to change the terminology of "infection on chip" to better reflect their work. The term is vague and it decreases novelty, as there are multiple infection on chips models that recapitulate other infections (recently reviewed in https://doi.org/10.1038/s41564-024-01645-6) including Ebola, SARS-CoV-2, Plasmodium and Candida. Maybe the term "sepsis on chip" would be more specific and exemplify better the work and novelty. Also, I would suggest that the authors carefully take a look at the text and consider when they use VoC or to current term IoC, as of now sometimes they are used interchangeably, with VoC being used occasionally in bacteria perfused experiments.

      We thank Reviewer #1 for this suggestion. Indeed, we have chosen to replace the term "Infection-on-Chip" by "infected Vessel-on-chip" to avoid any confusion in the title and the text. Also, we have removed all the terms "IoC" which referred to "Infection-on-Chip" and replaced with "VoC" for "Vessel-on-Chip". We think these terms will improve the clarity of the main text.

      Author response image 1.

      F-actin (red) and ezrin (yellow) staining after 3h of infection with N. meningitidis (green) in 2D (top) and 3D (bottom) vessel-on-chip models.

      Fig 3 and Supplementary 3: Permeability. The authors suggest that early 3h infection with Neisseria do not show increase in vascular permeability in the animal model, contrary to their findings in the 3D in vitro model. However, they show a non-significant increase in permeability of 70 KDa Dextran in the animal xenograft early infection. This seems to point that if the experiment would have been done with a lower molecular weight tracer, significant increases in permeability could have been detected. I would suggest to do this experiment that could capture early events in vascular disruption.

      Comparing permeability under healthy and infected conditions using Dextran smaller than 70 kDa is challenging. Previous research (1) has shown that molecules below 70 kDa already diffuse freely in healthy tissue. Given this high baseline diffusion, we believe that no significant difference would be observed before and after N. meningitidis infection and these experiments were not carried out. As discussed in the manuscript, bacteria induced permeability in mouse occurs at later time points, 16h post infection as shown previoulsy (2). As discussed in the manuscript, this difference between the xenograft model and the chip likely reflect the absence in the chip of various cell types present in the tissue parenchyma.

      The authors show the formation of actin of a honeycomb structure beneath the bacterial microcolonies. This only occurred in 65% of the microcolonies. Is this result similar to in vitro 2D endothelial cultures in static and under flow? Also, the group has shown in the past positive staining of other cytoskeletal proteins, such as ezrin in the ERM complex. Does this also occur in the 3D system?

      We thank the Reviewer #1 for this suggestion.

      • According to this recommendation, we imaged monolayers of endothelial cells in the flat regions of the chip (the two lateral channels) using the same microscopy conditions (i.e., Obj. 40X N.A. 1.05) that have been used to detect honeycomb structures in the 3D vessels in vitro. We showed that more than 56% of infected cells present these honeycomb structures in 2D, which is 13% less than in 3D, and is not significant due to the distributions of both populations. Thus, we conclude that under both in vitro conditions, 2D and 3D, the amount of infected cells exhibiting cortical plaques is similar. We have added the graph and the confocal images in Figure S4B and lines 418-419 of the revised manuscript.

      • We recently performed staining of ezrin in the chip and imaged both the 3D and 2D regions. Although ezrin staining was visible in 3D (Fig. 1 of this response), it was not as obvious as other markers under these infected conditions and we did not include it in the main text. Interpretation of this result is not straight forward as for instance the substrate of the cells is different and it would require further studies on the behaviour of ERM proteins in these different contexts.

      One of the most novel things of the manuscript is the use of a relatively quick photoablation system. I would suggest that the authors add a more extensive description of the protocol in methods. Could this technique be applied in other laboratories? If this is a major limitation, it should be listed in the discussion.

      Following the Reviewer’s comment, we introduced more detailed explanations regarding the photoablation:

      • L157-163 (Results): "Briefly, the chosen design is digitalized into a list of positions to ablate. A pulsed UV-LASER beam is injected into the microscope and shaped to cover the back aperture of the objective. The laser is then focused on each position that needs ablation. After introducing endothelial cells (HUVEC) in the carved regions,…"

      • L512-516 (Discussion): "The speed capabilities drastically improve with the pulsing repetition rate. Given that our laser source emits pulses at 10kHz, as compared to other photoablation lasers with repetitions around 100 Hz, our solution could potentially gain a factor of 100."

      • L1082-1087 (Materials and Methods): "…, and imported in a python code. The control of the various elements is embedded and checked for this specific set of hardware. The code is available upon request." Adding these three paragraphs gives more details on how photoablation works thus improving the manuscript.

      Minor comments:

      Supplementary Fig 2. The reference to subpanels H and I is swapped.

      The references to subpanels H and I have been correctly swapped back in the reviewed version.

      Line 203: I would suggest to delete this sentence. Although a strength of the submitted paper is the direct comparison of the VoC model with the animal model to better replicate Neisseria infection, a direct comparison with animal permeability is not needed in all vascular engineering papers, as vascular permeability measurements in animals have been well established in the past.

      The sentence "While previously developed VoC platforms aimed at replicating physiological permeability properties, they often lack direct comparisons with in vivo values." has been removed from the revised text.

      Fig 3: Bacteria binding experiments. I would suggest the addition of more methodological information in the main results text to guarantee a good interpretation of the experiment. First, it would be better that wall shear stress rather than flow rate is described in the main text, as flow rate is dependent on the geometry of the vessel being used. Second, how long was the perfusion of Neisseria in the binding experiment performed to quantify colony doubling or elongation? As per figure 1C, I would guess than 100 min, but it would be better if this information is directly given to the readers.

      We thank Reviewer #1 for these two suggestions that will improve the text clarity (e.g., L316). (i) Indeed, we have changed the flow rate in terms of shear stress. (ii) Also, we have normalized the quantification of the colony doubling time according to the first time-point where a single bacteria is attached to the vessel wall. Thus, early adhesion bacteria will be defined by a longer curve while late adhesion bacteria by a shorter curve. In total, the experiment lasted for 3 hours (modifications appear in L318 and L321-326).

      Fig 4: The honeycomb structure is not visible in the 3D rendering of panel D. I would recommend to show the actin staining in the absence of Neisseria staining as well.

      According to this suggestion, a zoom of the 3D rendering of the cortical plaque without colony had been added to the figure 4 of the revised manuscript.

      Line 421: E-selectin is referred as CD62E in this sentence. I would suggest to use the same terminology everywhere.

      We have replaced the "CD62E" term with "E-selectin" to improve clarity.

      Line 508: "This difference is most likely associated with the presence of other cell types in the in vivo tissues and the onset of intravascular coagulation". Do the authors refer to the presence of perivascular cells, pericytes or fibroblasts? If so, it could be good to mention them, as well as those future iterations of the model could include the presence of these cell types.

      By "other cell types", we refer to pericytes (3), fibroblasts (4), and perivascular macrophages (5), which surround endothelial cells and contribute to vessel stability. The main text was modified to include this information (Lines 548 and 555-570) and their potential roles during infection disussed.

      Discussion: The discussion covers very well the advantages of the model over in vitro 2D endothelial models and the animal xenograft but fails to include limitations. This would include the choice of HUVEC cells, an umbilical vein cell line to study microcirculation, the lack of perivascular cells or limitations on the fabrication technique regarding application in other labs (if any).

      We thank Reviewer #1 for this suggestion. Indeed, our manuscript may lack explaining limitations, and adding them to the text will help improve it:

      • The perspectives of our model include introducing perivascular cells surrounding the vessel and fibroblasts into the collagen gel as discussed previously and added in the discussion part (L555-570).

      • Our choice for HUVEC cells focused on recapitulating the characteristics of venules that respect key features such as the overexpression of CD62E and adhesion of neutrophils during inflammation. Using microvascular endothelial cells originating from different tissues would be very interesting. This possibility is now mentioned in the discussion lines 567-568.

      • Photoablation is a homemade fabrication technique that can be implemented in any lab harboring an epifluorescence microscope. This method has been more detailed in the revised manuscript (L1085-1087).

      Line 576: The authors state that the model could be applied to other systemic infections but failed to mention that some infections have already been modelled in 3D bioengineered vascular models (examples found in https://doi.org/10.1038/s41564-024-01645-6). This includes a capillary photoablated vascular model to study malaria (DOI: 10.1126/sciadv.aay724).

      Thes two important references have been introduced in the main text (L84, 647, 648).

      Line 1213: Are the 6M neutrophil solution in 10ul under flow. Also, I would suggest to rewrite this sentence in the following line "After, the flow has been then added to the system at 0.7-1 µl/min."

      We now specified that neutrophils are circulated in the chip under flow conditions, lines 1321-1322.

      Significance

      The manuscript is comprehensive, complete and represents the first bioengineered model of sepsis. One of the major strengths is the carful characterization and benchmarking against the animal xenograft model. Its main limitations is the brief description of the photoablation methodology and more clarity is needed in the description of bacteria perfusion experiments, given their complexity. The manuscript will be of interest for the general infection community and to the tissue engineering community if more details on fabrication methods are included. My expertise is on infection bioengineered models.

      Reviewer #2 (Evidence, reproducibility, and clarity):

      Summary:

      The authors develop a Vessel-on-Chip model, which has geometrical and physical properties similar to the murine vessels used in the study of systemic infections. The vessel was created via highly controllable laser photoablation in a collagen matrix, subsequent seeding of human endothelial cells and flow perfusion to induce mechanical cues. This vessel could be infected with Neisseria meningitidis, as a model of systemic infection. In this model, microcolony formation and dynamics, and effects on the host were very similar to those described for the human skin xenograft mouse, which is the current gold standard for these studies, and were consistent with observations made in patients. The model could also recapitulate the neutrophil response upon N. meningitidis systemic infection.

      Major comments:

      I have no major comments. The claims and the conclusions are supported by the data, the methods are properly presented and the data is analyzed adequately. Furthermore, I would like to propose an optional experiment could improve the manuscript. In the discussion it is stated that the vascular geometry might contribute to bacterial colonization in areas of lower velocity. It would be interesting to recapitulate this experimentally. It is of course optional but it would be of great interest, since this is something that can only be proven in the organ-on-chip (where flow speed can be tuned) and not as much in animal models. Besides, it would increase impact, demonstrating the superiority of the chip in this area rather than proving to be equal to current models.

      We have conducted additional experiments on infection in different vascular geometries now added these results figure 3/S3 and lines 288-305. We compared sheared stress levels as determined by Comsol simulation and experimentally determined bacterial adhesion sites. In the conditions used, the range of shear generated by the tested geometries do not appear to change the efficiency of bacterial adhesion. These results are consistent with a previous study from our group which show that in this range of shear stresses the effect on adhesion is limited (6) . Furthermore, qualitative observations in the animal model indicate that bacteria do not have an obvious preference in terms of binding site.

      Minor comments:

      I have a series of suggestions which, in my opinion, would improve the discussion. They are further elaborated in the following section, in the context of the limitations.

      • How to recapitulate the vessels in the context of a specific organ or tissue? If the pathogen is often found in the luminal space of other organs after disseminating from the blood, how can this process be recapitulated with this mode, if at all?

      For reasons that are not fully understood, postmortem histological studies reveal bacteria only inside blood vessels but rarely if ever in the organ parenchyma. The presence of intravascular bacteria could nevertheless alter cells in the tissue parenchyma. The notable exception is the brain where bacteria exit the bacterial lumen to access the cerebrospinal fluid. The chip we describe is fully adapted to develop a blood brain barrier model and more specific organ environments. This implies the addition of more cell types in the hydrogel. A paragraph on this topic has been added (Lines 548 and 552-570).

      • Similarly, could other immune responses related to systemic infection be recapitulated? The authors could discuss the potential of including other immune cells that might be found in the interstitial space, for example.

      This important discussion point has been added to the manuscript (L623-636). As suggested by Reviewer #2, other immune cells respond to N. meningitis and can be explored using our model. For instance, macrophages and dendritic cells are activated upon N. meningitis infection, eliminate the bacteria through phagocytosis, produce pro-inflammatory cytokines and chemokines potentially activating lymphocytes (7). Such an immune response, yet complex, would be interesting to study in our model as skin-xenograft mice are deprived of B and T lymphocytes to ensure acceptance of human skin grafts.

      • A minor correction: in line 467 it should probably be "aspects" instead of "aspect", and the authors could consider rephrasing that sentence slightly for increased clarity.

      We have corrected the sentence with "we demonstrated that our VoC strongly replicates key aspects of the in vivo human skin xenograft mouse model, the gold standard for studying meningococcal disease under physiological conditions." in lines 499-503.

      Strengths and limitations

      The most important strength of this manuscript is the technology they developed to build this model, which is impressive and very innovative. The Vessel-on-Chip can be tuned to acquire complex shapes and, according to the authors, the process has been optimized to produce models very quickly. This is a great advancement compared with the technologies used to produce other equivalent models. This model proves to be equivalent to the most advanced model used to date, but allows to perform microscopy with higher resolution and ease, which can in turn allow more complex and precise image-based analysis. However, the authors do not seem to present any new mechanistic insights obtained using this model. All the findings obtained in the infection-on-chip demonstrate that the model is equivalent to the human skin xenograft mouse model, and can offer superior resolution for microscopy. However, the advantages of the model do not seem to be exploited to obtain more insights on the pathogenicity mechanisms of N. meningitidis, host-pathogen interactions or potential applications in the discovery of potential treatments. For example, experiments to elucidate the role of certain N. meningiditis genes on infection could enrich the manuscript and prove the superiority of the model. However, I understand these experiments are time-consuming and out of the scope of the current manuscript. In addition, the model lacks the multicellularity that characterizes other similar models. The authors mention that the pathogen can be found in the luminal space of several organs, however, this luminal space has not been recapitulated in the model. Even though this would be a new project, it would be interesting that the authors hypothesize about the possibilities of combining this model with other organ models. The inclusion of circulating neutrophils is a great asset; however it would also be interesting to hypothesize about how to recapitulate other immune responses related to systemic infection.

      We thank Reviewer #2 for his/her comment on the strengths and limitations of our work. The difficulty is that our study opens many futur research directions and applications and we hope that the work serves as the basis for many future studies but one can only address a limited set of experiments in a single manuscript.

      • Experiments investigating the role of N. meningitidis genes require significant optimization of the system. Multiplexing is a potential avenue for future development, which would allow the testing of many mutants. The fast photoablation approach is particularly amenable to such adaptation.

      • Cells and bacteria inside the chambers could be isolated and analyzed at the transcriptomic level or by flow cytometry. This would imply optimizing a protocol for collecting cells from the device via collagenase digestion, for instance. This type of approach would also benefit from multiplexing to enhance the number of cells.

      • As mentioned above, the revised manuscript discusses the multicellular capabilities of our model, including the integration of additional immune cells and potential connections to other organ systems. We believe that these approaches are feasible and valuable for studying various aspects of N. meningitidis infection.

      Advance

      The most important advance of this manuscript is technical: the development of a model that proves to be equivalent to the most complex model used to date to study meningococcal systemic infections. The human skin xenograft mouse model requires complex surgical techniques and has the practical and ethical limitations associated with the use of animals. However, the Infection-on-chip model is completely in vitro, can be produced quickly, and allows to precisely tune the vessel’s geometry and to perform higher resolution microscopy. Both models were comparable in terms of the hallmarks defining the disease, suggesting that the presented model can be an effective replacement of the animal use in this area.

      Other vessel-on-chip models can recapitulate an endothelial barrier in a tube-like morphology, but do not recapitulate other complex geometries, that are more physiologically relevant and could impact infection (in addition to other non-infectious diseases). However, in the manuscript it is not clear whether the different morphologies are necessary to study or recapitulate N. meningitidis infection, or if the tubular morphologies achieved in other similar models would suffice.

      Audience

      This manuscript might be of interest for a specialized audience focusing on the development of microphysiological models. The technology presented here can be of great interest to researchers whose main area of interest is the endothelium and the blood vessels, for example, researchers on the study of systemic infections, atherosclerosis, angiogenesis, etc. Thus, the tool presented (vessel-on-chip) can have great applications for a broad audience. However, even when the method might be faster and easier to use than other equivalent methods, it could still be difficult to implement in another laboratory, especially if it lacks expertise in bioengineering. Therefore, the method could be more of interest for laboratories with expertise in bioengineering looking to expand or optimize their toolbox. Alternatively, this paper present itself as an opportunity to begin collaborations, since the model could be used to test other pathogen or conditions.

      Field of expertise:

      Infection biology, organ-on-chip, fungal pathogens.

      I lack the expertise to evaluate the image-based analysis.

      References

      (1) Gyohei Egawa, Satoshi Nakamizo, Yohei Natsuaki, Hiromi Doi, Yoshiki Miyachi, and Kenji Kabashima. Intravital analysis of vascular permeability in mice using two-photon microscopy. Scientific Reports, 3(1):1932, Jun 2013. ISSN 2045-2322. doi: 10.1038/srep01932.

      (2) Valeria Manriquez, Pierre Nivoit, Tomas Urbina, Hebert Echenique-Rivera, Keira Melican, Marie-Paule Fernandez-Gerlinger, Patricia Flamant, Taliah Schmitt, Patrick Bruneval, Dorian Obino, and Guillaume Duménil. Colonization of dermal arterioles by neisseria meningitidis provides a safe haven from neutrophils. Nature Communications, 12(1):4547, Jul 2021. ISSN 2041-1723. doi: 10.1038/s41467-021-24797-z.

      (3) Mats Hellström, Holger Gerhardt, Mattias Kalén, Xuri Li, Ulf Eriksson, Hartwig Wolburg, and Christer Betsholtz. Lack of pericytes leads to endothelial hyperplasia and abnormal vascular morphogenesis. Journal of Cell Biology, 153(3):543–554, Apr 2001. ISSN 0021-9525. doi: 10.1083/jcb.153.3.543.

      (4) Arsheen M. Rajan, Roger C. Ma, Katrinka M. Kocha, Dan J. Zhang, and Peng Huang. Dual function of perivascular fibroblasts in vascular stabilization in zebrafish. PLOS Genetics, 16(10):1–31, 10 2020. doi: 10.1371/journal.pgen.1008800.

      (5) Huanhuan He, Julia J. Mack, Esra Güç, Carmen M. Warren, Mario Leonardo Squadrito, Witold W. Kilarski, Caroline Baer, Ryan D. Freshman, Austin I. McDonald, Safiyyah Ziyad, Melody A. Swartz, Michele De Palma, and M. Luisa Iruela-Arispe. Perivascular macrophages limit permeability. Arteriosclerosis, Thrombosis, and Vascular Biology, 36(11):2203–2212, 2016. doi: 10.1161/ATVBAHA. 116.307592.

      (6) Emilie Mairey, Auguste Genovesio, Emmanuel Donnadieu, Christine Bernard, Francis Jaubert, Elisabeth Pinard, Jacques Seylaz, Jean-Christophe Olivo-Marin, Xavier Nassif, and Guillaume Dumenil. Cerebral microcirculation shear stress levels determine Neisseria meningitidis attachment sites along the blood–brain barrier . Journal of Experimental Medicine, 203(8):1939–1950, 07 2006. ISSN 0022-1007. doi: 10.1084/jem.20060482.

      (7) Riya Joshi and Sunil D. Saroj. Survival and evasion of neisseria meningitidis from macrophages. Medicine in Microecology, 17:100087, 2023. ISSN 2590-0978. doi: https://doi.org/10.1016/j.medmic. 2023.100087.

    1. Author Response:

      Assessment note: “Whereas the results and interpretations are generally solid, the mechanistic aspect of the work and conclusions put forth rely heavily on in vitro studies performed in cultured L6 myocytes, which are highly glycolytic and generally not viewed as a good model for studying muscle metabolism and insulin action.”

      While we acknowledge that in vitro models may not fully recapitulate the complexity of in vivo systems, we believe that our use of L6 myotubes is appropriate for studying the mechanisms underlying muscle metabolism and insulin action. As mentioned below (reviewer 2, point 1), L6 myotubes possess many important characteristics relevant to our research, including high insulin sensitivity and a similar mitochondrial respiration sensitivity to primary muscle fibres. Furthermore, several studies have demonstrated the utility of L6 myotubes as a model for studying insulin sensitivity and metabolism, including our own previous work (PMID: 19805130, 31693893, 19915010).

      In addition, we have provided evidence of the similarities between L6 cells overexpressing SMPD5 and human muscle biopsies at protein levels and the reproducibility of the negative correlation between ceramide and Coenzyme Q observed in L6 cells in vivo, specifically in the skeletal muscle of mice in chow diet. These findings support the relevance of our in vitro results to in vivo muscle metabolism.

      Finally, we will supplement our findings by demonstrating a comparable relationship between ceramide and Coenzyme Q in mice exposed to a high-fat diet, to be shown in Supplementary Figure 4 H-I. Further animal experiments will be performed to validate our cell-line based conclusions. We hope that these additional results address the concerns raised by the reviewer and further support the relevance of our in vitro findings to in vivo muscle metabolism and insulin action.

      Points from reviewer 1:

      1. Although the authors' results suggest that higher mitochondrial ceramide levels suppress cellular insulin sensitivity, they rely solely on a partial inhibition (i.e., 30%) of insulin-stimulated GLUT4-HA translocation in L6 myocytes. It would be critical to examine how much the increased mitochondrial ceramide would inhibit insulin-induced glucose uptake in myocytes using radiolabel deoxy-glucose.

      Response: The primary impact of insulin is to facilitate the translocation of glucose transporter type 4 (GLUT4) to the cell surface, which effectively enhances the maximum rate of glucose uptake into cells. Therefore, assessing the quantity of GLUT4 present at the cell surface in non-permeabilized cells is widely regarded as the most reliable measure of insulin sensitivity (PMID: 36283703, 35594055, 34285405). Additionally, plasma membrane GLUT4 and glucose uptake are highly correlated. Whilst we have routinely measured glucose uptake with radiolabelled glucose in the past, we do not believe that evaluating glucose uptake provides a better assessment of insulin sensitivity than GLUT4.

      We will clarify the use of GLUT4 translocation in the Results section:

      “...For this reason, several in vitro models have been employed involving incubation of insulin sensitive cell types with lipids such as palmitate to mimic lipotoxicity in vivo. In this study we will use cell surface GLUT4-HA abundance as the main readout of insulin response...”

      1. Another important question to be addressed is whether glycogen synthesis is affected in myocytes under these experimental conditions. Results demonstrating reductions in insulin-stimulated glucose transport and glycogen synthesis in myocytes with dysfunctional mitochondria due to ceramide accumulation would further support the authors' claim.

      Response: We have carried out supplementary experiments to investigate glycogen synthesis in our insulin-resistant models. Our approach involved L6-myotubes overexpressing the mitochondrial-targeted construct ASAH1 (as described in Fig. 3). We then challenged them with palmitate and measured glycogen synthesis using 14C radiolabeled glucose. Our observations indicated that palmitate suppressed insulin-induced glycogen synthesis, which was effectively prevented by the overexpression of ASAH1 (N = 5, * p<0.05). These results provide additional evidence highlighting the role of dysfunctional mitochondria in muscle cell glucose metabolism.

      These data will be added to Supplementary Figure 4K and the results modified as follows:

      “Notably, mtASAH1 overexpression protected cells from palmitate-induced insulin resistance without affecting basal insulin sensitivity (Fig. 3E). Similar results were observed using insulin-induced glycogen synthesis as an ortholog technique for Glut4 translocation. These results provide additional evidence highlighting the role of dysfunctional mitochondria in muscle cell glucose metabolism (Sup. Fig. 5K). Importantly, mtASAH1 overexpression did not rescue insulin sensitivity in cells depleted…”

      We will add to the method section:

      “L6 myotubes overexpressing ASAH were grown and differentiated in 12-well plates, as described in the Cell lines section, and stimulated for 16 h with palmitate-BSA or EtOH-BSA, as detailed in the Induction of insulin resistance section.

      On day seven of differentiation, myotubes were serum starved in plain DMEM for 3 and a half hours. After incubation for 1 hour at 37C with 2 µCi/ml D-[U-14C]-glucose in the presence or absence of 100 nM insulin, glycogen synthesis assay was performed, as previously described (Zarini S. et al., J Lipid Res, 63(10): 100270, 2022).”

      1. In addition, it would be critical to assess whether the increased mitochondrial ceramide and consequent lowering of energy levels affect all exocytic pathways in L6 myoblasts or just the GLUT4 trafficking. Is the secretory pathway also disrupted under these conditions?

      Response: As the secretory pathway primarily involves the synthesis and transportation of soluble proteins that are secreted into the extracellular space, and given that the majority of cellular transmembrane proteins (excluding those of the mitochondria) use this pathway to arrive at their ultimate destination, we believe that the question posed by the reviewer is highly challenging and beyond the scope of our research. We will add this to the discussion:

      “...the abundance of mPTP associated proteins suggesting a role of this pore in ceramide induced insulin resistance (Sup. Fig. 6E). In addition, it is yet to be determined whether the trafficking defect is specific to Glut4 or if it affects the exocytic-secretory pathway more broadly…”

      Points from reviewer 2:

      1. The mechanistic aspect of the work and conclusions put forth rely heavily on studies performed in cultured myocytes, which are highly glycolytic and generally viewed as a poor model for studying muscle metabolism and insulin action. Nonetheless, the findings provide a strong rationale for moving this line of investigation into mouse gain/loss of function models.

      Response: The relative contribution of the anaerobic (glycolysis) and aerobic (mitochondria) contribution to the muscle metabolism can change in L6 depending on differentiation stage. For instance, Serrage et al (PMID30701682) demonstrated that L6-myotubes have a higher mitochondrial abundance and aerobic metabolism than L6-myoblasts. Others have used elegant transcriptomic analysis and metabolic characterisation comparing different skeletal muscle models for studying insulin sensitivity. For instance, Abdelmoez et al in 2020 (PMID31825657) reported that L6 myotubes exhibit greater insulin-stimulated glucose uptake and oxidative capacity compared with C2C12 and Human Mesenchymal Stem Cells (HMSC). Overall, L6 cells exhibit higher metabolic rates and primarily rely on aerobic metabolism, while C2C12 and HSMC cells rely on anaerobic glycolysis. It is worth noting that L6 myotubes are the cell line most closely related to adult human muscle when compared with other muscle cell lines (PMID31825657). Our presented results in Figure 6 H and I provide evidence for the similarities between L6 cells overexpressing SMPD5 and human muscle biopsies. Additionally, in Figure 3J-K, we demonstrate the reproducibility of the negative correlation between ceramide and Coenzyme Q observed in L6 cells in vivo, specifically in the skeletal muscle of mice in chow diet. Furthermore, we have supplemented these findings by demonstrating a comparable relationship in mice exposed to a high-fat diet, as shown in Supplementary Figure 4 H-I (refer to point 4). We will clarify these points in the Discussion:

      “In this study, we mainly utilised L6-myotubes, which share many important characteristics with primary muscle fibres relevant to our research. Both types of cells exhibit high sensitivity to insulin and respond similarly to maximal doses of insulin, with Glut4 translocation stimulated between 2 to 4 times over basal levels in response to 100 nM insulin (as shown in Fig. 1-4 and (46,47)). Additionally, mitochondrial respiration in L6-myotubes have a similar sensitivity to mitochondrial poisons, as observed in primary muscle fibres (as shown in Fig. 5 (48)). Finally, inhibiting ceramide production increases CoQ levels in both L6-myotubes and adult muscle tissue (as shown in Fig. 2-3). Therefore, L6-myotubes possess the necessary metabolic features to investigate the role of mitochondria in insulin resistance, and this relationship is likely applicable to primary muscle fibres”.

      We will also add additional data - in point 2 - from differentiated human myocytes that are consistent with our observations from the L6 models. Additional experiments are in progress to further extend these findings.

      1. One caveat of the approach taken is that exposure of cells to palmitate alone is not reflective of in vivo physiology. It would be interesting to know if similar effects on CoQ are observed when cells are exposed to a more physiological mixture of fatty acids that includes a high ratio of palmitate, but better mimics in vivo nutrition.

      Response: Palmitate is widely recognized as a trigger for insulin resistance and ceramide accumulation, which mimics the insulin resistance induced by a diet in rodents and humans. Previous studies have compared the effects of a lipid mixture versus palmitate on inducing insulin resistance in skeletal muscle, and have found that the strong disruption in insulin sensitivity caused by palmitate exposure was lessened with physiologic mixtures of fatty acids, even with a high proportion of saturated fatty acids. This was associated, in part, to the selective partitioning of fatty acids into neutral lipids (such as TAG) when muscle cells are exposed to physiologic lipid mixtures (Newsom et al PMID25793412). Hence, we think that using palmitate is a better strategy to study lipid-induced insulin resistance in vitro. We will add to results:

      “In vitro, palmitate conjugated with BSA is the preferred strategy for inducing insulin resistance, as lipid mixtures tend to partition into triacylglycerides (33)”.

      We are also performing additional in vivo experiments to add to the physiological relevance of the findings.

      1. While the utility of targeting SMPD5 to the mitochondria is appreciated, the results in Figure 5 suggest that this manoeuvre caused a rather severe form of mitochondrial dysfunction. This could be more representative of toxicity rather than pathophysiology. It would be helpful to know if these same effects are observed with other manipulations that lower CoQ to a similar degree. If not, the discrepancies should be discussed.

      Response: We conducted a staining procedure using the mitochondrial marker mitoDsRED to observe the effect of SMPD5 overexpression on cell toxicity. The resulting images, displayed in the figure below (Author response image 1), demonstrate that the overexpression of SMPD5 did not result in any significant changes in cell morphology or impact the differentiation potential of our myoblasts into myotubes.

      Author response image 1.

      In addition, we evaluated cell viability in HeLa cells following exposure to SACLAC (2 uM) to induce CoQ depletion (left panel). Specifically, we measured cell death by monitoring the uptake of Propidium iodide (PI) as shown in the right panel. Our results demonstrated that Saclac-induced CoQ depletion did not lead to cell death at the doses used for CoQ depletion (Author response image 2).

      Author response image 2.

      Therefore, we deemed it improbable that the observed effect is caused by cellular toxicity, but rather represents a pathological condition induced by elevated levels of ceramides. We will add to discussion:

      “...downregulation of the respirasome induced by ceramides may lead to CoQ depletion. Despite the significant impact of ceramide on mitochondrial respiration, we did not observe any indications of cell damage in any of the treatments, suggesting that our models are not explained by toxic/cell death events.”

      1. The conclusions could be strengthened by more extensive studies in mice to assess the interplay between mitochondrial ceramides, CoQ depletion and ETC/mitochondrial dysfunction in the context of a standard diet versus HF diet-induced insulin resistance. Does P053 affect mitochondrial ceramide, ETC protein abundance, mitochondrial function, and muscle insulin sensitivity in the predicted directions?

      Response: We would like to note that the metabolic characterization and assessment of ETC/mitochondrial function in these mice (both fed a high-fat (HF) and chow diet, with or without P053) were previously published (Turner N, PMID30131496). In addition to this, we have conducted targeted metabolomic and lipidomic analyses to investigate the impact of P053 on ceramide and CoQ levels in HF-fed mice. As illustrated in the figures below (Author response image 3), the administration of P053 led to a reduction in ceramide levels (left panel) and an increase in CoQ levels (right panel) in HF-fed mice, which is consistent with our in vitro findings.

      Author response image 3.

      We will add to results:

      “…similar effect was observed in mice exposed to a high fat diet for 5 wks (Supp. Fig. 4H-I further phenotypic and metabolic characterization of these animals can be found in (41))”

      We will further perform more in-vivo studies to corroborate these findings.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Alonso-Calleja and colleagues explore the role of TGR5 in adult hematopoiesis at both steady state and post-transplantation. The authors utilize two different mouse models including a TGR5-GFP reporter mouse to analyze the expression of TGR5 in various hematopoietic cell subsets. Using germline Tgr5-/- mice it's reported that loss of Tgr5 has no significant impact on steady-state hematopoiesis, with a small decrease in trabecular bone fraction, associated with a reduction in proximal tibia adipose tissue, and an increase in marrow phenotypic adipocytic precursors. The authors further explored the role of stroma TGR5 expression in the hematopoietic recovery upon bone marrow transplantation of wild-type cells, although the studies supporting this claim are weak. Overall, while most of the hematopoietic phenotypes have negative results or small effects, the role of TGR5 in adipose tissue regulation is interesting to the field.

      We thank Reviewer 1 for having identified some strengths and weaknesses of our study. As summarized below, we will work to consolidate the weaknesses of our study.

      Strengths:

      • This is the first time the role of TGR5 has been examined in the bone marrow.

      • This paper supports further exploration of the role of bile acids in bone marrow transplantation and possible therapeutic strategies.

      Weaknesses:

      • The authors fail to describe whether niche stroma cells or adipocyte progenitor cells (APCs) express TGR5.

      We are currently working to address this question using our reporter model and expect to be able to provide the data in the next version of the reviewed preprint.

      • Although the authors note a significant reduction in bone marrow adipose tissue in Tgr5-/- mice, they do not address whether this is white or brown adipose tissue especially since BA-TGR5 signaling has been shown to play a role in beiging.

      The nature of BMAT and how it relates to brown, white or brown/beige adipose tissue has been a persistent question in the field. Our understanding is that BMAT is currently considered a distinct adipose depot that is neither white nor brown/beige. BMAT does not express UCP1 to an appreciable extent, with reports showing its expressing possibly detecting contamination by tissues surrounding bone (Craft et al., 2019). Beyond this consideration, as the regulated BMAT in TGR5-/- mice is almost absent, determination of the brown/beige vs white nature of the regulated BMAT remains technically challenging.

      In Figure 1, the authors explore different progenitor subsets but stop short of describing whether TGR5 is expressed in hematopoietic stem cells (HSCs).

      Figure 1 of the originally submitted manuscript described TGR5 expression in committed myeloid progenitors (CMP, GMP and MEP). Below we provide the requested data (expression in MPPs and HSCs in Author response image 1) and we have further expanded our data with the expression in megakaryocyte progenitors (MkProg - Lin-cKit+Sca1-CD41+CD150+) as shown in Author response image 2.

      Author response image 1.

      Frequencies of GFP+ cells in MPPs and HSCs in the BM of 8-12-week-old male TGR5:GFP mice and their controls (n=9 for Wild-type control mice, n=11 for TGR5:GFP mice). Results represent the mean ± s.e.m., n represents biologically independent replicates. Two-tailed Student’s t-test was used for statistical analysis. p-values (exact value) are indicated.

      Author response image 2.

      A, representative flow cytometry gating strategy used to identify megakaryocyte progenitors (MkProg) and GFP positivity in TGR5:GFP mice and their wild-type controls. B, frequencies of GFP+ cells in MkProg population in the BM of 8-12-week-old male TGR5:GFP mice and their controls (n=3 for Wild-type control mice, n=4 for TGR5:GFP mice). Results represent the mean ± s.e.m., n represents biologically independent replicates. Two-tailed Student’s t-test (B) was used for statistical analysis. p-values (exact value) are indicated.

      • Are there more CD45+ cells in the BM because hematopoietic cells are proliferating more due to a direct effect of the loss of Tgr5 or is it because there is just more space due to less trabecular bone?

      While we do not have direct evidence to address this question, we see approximately an average 20% increase in CD45+ cell counts in the baseline Tgr5-/- mice. The absolute volume of bone and BMAT lost in these animals does not account for 20% of the total volume of the medullary cavity, so we speculate that the increase in CD45+ counts is not due exclusively to an increase in available volume.

      • In Figure 4 no absolute cell counts are provided to support the increase in immunophenotypic APCs (CD45-Ter119-CD31-Sca1+CD24-) in the stroma of Tgr5-/- mice. Accordingly, the absolute number of total stromal cells and other stroma niche cells such as MSCs, ECs are missing.

      We initially chose not to report the total number of cells per leg, as the processing of the bones for stroma isolation is less homogenous than that of the HSPC populations (which we do by crushing whole bones with a mortar and pestle). Regardless of these considerations, the data for absolute counts of APCs (left panel), the stroma-enriched fraction (CD45-Ter119-CD31- - middle panel) and endothelial cells (CD45-Ter119-CD31+ - right panel) is provided in Author response image 3. Note that the number of cells plated for CFU-F and BMSC in vitro differentiation is constant between the genotypes, thus confirming the importance of ther elative abundance data shown in the submitted version of the manuscript. In conclusion, we have prioritized the data showing the relative overrepresentation of APC progenitors in the BM stroma as measured by flow cytometry in a per cell basis, which is in line with the functional in vitro data. Further studies could address the specific question through 3D wholemount studies once APC in situ markers are firmly characterized.

      Author response image 3.

      Left panel: absolute number of adipocyte progenitor cells (APCs) in the CD45-Ter119-CD31- BM stromal gate for bothTgr5+/+ and Tgr5−/− (n=5). Middle panel: absolute number of cells isolated from the stroma-enriched BM fraction (CD45-Ter119-CD31-) in the same mice. Right panel: absolute number of endothelial cells, defined as CD45-Ter119-CD31+, in the same BM isolates.

      • There are issues with the reciprocal transplantation design in Fig 4. Why did the authors choose such a low dose (250 000) of BM cells to transplant? If the effect is true and relevant, the early recovery would be observed independently of the setup and a more robust engraftment dataset would be observed without having lethality post-transplant. On the same note, it's surprising that the authors report ~70% lethality post-transplant from wild-type control mice (Fig 4E), according to the literature 200 000 BM cells should ensure the survival of the recipient post-TBI. Overall, the results even in such a stringent setup still show minimal differences and the study lacks further in-depth analyses to support the main claim.

      We thank the reviewer for this comment. On the one hand, we disagree on the relevance of the effect size, as Tgr5-/- mice recover from low levels of platelets significantly faster than the Tgr5+/+ controls. Underlining the relevance, in a clinical setting, G-CSF is administered to patients routinely even if the acceleration of recovery is of 1-2 days (Trivedi et al., 2009).

      From the point of view of the mortality, we agree that it is higher than expected. We have suffered from cases of swollen muzzles syndrome in our facilities that have greatly hampered our ability to perform myeloablation experiments (Garrett et al., 2019), as even sublethal doses have resulted in the appearance of severe side effects that are reasons for euthanasia under Swiss legislation. For example, a strong reduction in mobility requires immediate euthanasia. All experiments were performed blinded to genotype allocation, so we can reasonably exclude experimenter bias. Finally, it could be argued that mice with more marked symptomatology leading to euthanasia are more likely to have hematopoietic deficits, which in our case was mostly seen for Tgr5+/+animals. We have therefore chosen to report mortality together with the longitudinal assessment of peripheral blood counts.

      • Mechanistically, how does the loss of Tgr5 impact hematopoietic regeneration following sublethal irradiation?

      The question of a non-lethal hematopoietic stress is a very relevant one. Unfortunately, and as delineated in the previous point, we have been seriously conditioned by cases of swollen muzzles syndrome (Garrett et al., 2019) that have stopped us from proceeding with more irradiation studies. We will profit from the change of animal facility that will consolidate during the upcoming year Labora(tory of Regenerative Hematopoiesis) to address this point in follow-up studies.

      • Only male mice were used throughout this study. It would be beneficial to know whether female mice show similar results.

      We agree with this comment, and we expect to include the characterization of BM microenvironment (Figure 3 of the current manuscript) in females in the reviewed version of the manuscript when a suitable cohort becomes available.

      Reviewer #2 (Public Review):

      Summary: In this manuscript, the authors examined the role of the bile acid receptor TGR5 in the bone marrow under steady-state and stress hematopoiesis. They initially showed the expression of TGR5 in hematopoietic compartments and that loss of TGR5 doesn't impair steady-state hematopoiesis. They further demonstrated that TGR5 knockout significantly decreases BMAT, increases the APC population, and accelerates the recovery upon bone marrow transplantation.

      Strengths: The manuscript is well-structured and well-written.

      We thank Reviewer #2 for this comment.

      Weaknesses: The mechanism is not clear, and additional studies need to be performed to support the authors' conclusion.

      We agree with Reviewer #2 that more studies are needed to understand what the role of TGR5 in the hematopoietic system is. We have been hampered in our studies of stress hematopoiesis because of frequent cases of swollen muzzles syndrome (Garrett et al., 2019), which has made difficult to continue with experiments involving myelosuppression (see response to Reviewer #1 as well). Further studies are planned or ongoing, including determining the role of the microbiome on the observed TGR5 bone and hematopoiesis stress phenotypes, but will be the focus of a separate study.

      References

      Craft, C.S., Robles, H., Lorenz, M.R., Hilker, E.D., Magee, K.L., Andersen, T.L., Cawthorn, W.P., MacDougald, O.A., Harris, C.A., Scheller, E.L., 2019. Bone marrow adipose tissue does not express UCP1 during development or adrenergic-induced remodeling. Sci Rep 9, 17427. https://doi.org/10.1038/s41598-019-54036-x

      Garrett, J., Sampson, C.H., Plett, P.A., Crisler, R., Parker, J., Venezia, R., Chua, H.L., Hickman, D.L., Booth, C., MacVittie, T., Orschell, C.M., Dynlacht, J.R., 2019. Characterization and Etiology of Swollen Muzzles in Irradiated Mice. Radiat Res 191, 31–42. https://doi.org/10.1667/RR14724.1

      Trivedi, M., Martinez, S., Corringham, S., Medley, K., Ball, E.D., 2009. Optimal use of G-CSF administration after hematopoietic SCT. Bone Marrow Transplant 43, 895–908. https://doi.org/10.1038/bmt.2009.75

    1. Author Response

      eLife assessment

      In this valuable study, the authors investigate the mechanism of amyloid nucleation in a cellular system using their novel ratiometric measurements and uncover interesting insights regarding the role of polyglutamine length and the sequence features of glutamine-rich regions on amyloid formation. Overall, the problem is significant and being able to assess nucleation in cells is of considerable relevance. The data, as presented and analyzed, are currently still incomplete. The specific claims would be stronger if based on in vitro measurements that avoid the intricacies of specific cellular systems and that are more suitable for assessing sequence-intrinsic properties.

      We are pleased that the editors find our study valuable. We find that the reviewers’ criticisms largely arise from misunderstandings inherent to the conceptually challenging nature of the topic, rather than fundamental flaws, as we will elaborate here. We are grateful for the opportunity afforded by eLife to engage reviewers in a constructive public dialogue.

      Reviewer #1 (Public Review):

      The authors take on the challenge of defining the core nucleus for amyloid formation by polyglutamine tracts. This rests on the assertion that polyQ forms amyloid structures to the exclusion of all other forms of solids. Using their unique assay, deployed in yeast, the authors attempt to infer the size of the nucleus that templates amyloid formation by polyQ. Further, through a series of sequence titrations, all studied using a single type of assay, the authors converge on an assertion stating that a single polyQ molecule is the nucleus for amyloid formation, that 12-residues make up the core of the nucleus, that it takes ca. 60 Qs in a row to unmask this nucleation potential, and that polyQ amyloid formation belongs to the same universality class as self-poisoned crystallization, which is the hallmark of crystallization from polymer melts formed by large, high molecular weight synthetic polymers. Unfortunately, the authors have decided to lean in hard on their assertions without a critical assessment of whether their findings stand up to scrutiny. If their findings are truly an intrinsic property of polyQ molecules, then their findings should be reconstituted in vitro. Unfortunately, careful and rigorous experiments in vitro show that there is a threshold concentration for forming fibrillar solids. This threshold concentration depends on the flanking sequence context on temperature and on solution conditions. The existence of a threshold concentration defies the expectation of a monomer nucleus. The findings disagree with in vitro data presented by Crick et al., and ignored by the authors. Please see: https://doi.org/10.1073/pnas.1320626110. These reports present data from very different assays, the importance of which was underscored first by Regina Murphy and colleagues. The work of Crick et al., provides a detailed thermodynamic framework - see the SI Appendix. This framework dove tails with theory and simulations of Zhang and Muthukumar, which explains exactly how a system like polyQ might work (https://doi.org/10.1063/1.3050295). The picture one paints is radically different from what the authors converge upon. One is inclined to lean toward data that are gleaned using multiple methods in vitro because the test tube does not have all the confounding effects of a cellular milieu, especially when it comes to focusing on sequence-intrinsic conformational transitions of a protein. In addition to concerns about the limitations of the DAmFRET method, which based on the work of the authors in their collaborative paper by Posey et al., are being stretched to the limit, there is the real possibility that the cellular milieu, unique to the system being studied, is enabling transitions that are not necessarily intrinsic to the sequence alone. A nod in this direction is the work of Marc Diamond, which showed that having stabilized the amyloid form of Tau through coacervation, there is a large barrier that limits the loss of amyloid-like structure for Tau. There may well be something similar going on with the polyQ system. If the authors could show that their data are achievable in vitro without anything but physiological buffers one would have more confidence in a model that appears to contradict basic physical principles of how homopolymers self-assemble. Absent such additional evidence, numerous statements seem to be too strong. There are also several claims that are difficult to understand or appreciate.

      Rebuttal to the perceived necessity of in vitro experiments

      The overarching concern of this reviewer and reviewing editor is whether in-cell assays can inform on sequence-intrinsic properties. We understand this concern. We believe however that the relative merit of in-cell assays is largely a matter of perspective. The truly sequence-intrinsic behavior of polyQ, i.e. in a vacuum, is less informative than the “sequence-intrinsic” behaviors of interest that emerge in the presence of extraneous molecules from the appropriate biological context. In vitro experiments typically include a tiny number of these -- water, ions, and sometimes a crowding agent meant to approximate everything else. Obviously missing are the myriad quinary interactions with other proteins that collectively round out the physiological solvent. The question is what experimental context best approximates that of a living human neuron under which the pathological sequence-dependent properties of polyQ manifest. We submit that a living yeast cell comes closer to that ideal than does buffer in a test tube.

      The reviewer’s statements that our findings must be validated in vitro ignores the fact -- stressed in our introduction -- that decades of in vitro work have not yet generated definitive evidence for or against any specific nucleus model. In addition to the above, one major problem concerns the large sizes of in vitro systems that obscure the effects of primary nucleation. For example, a typical in vitro experimental volume of e.g. 1.5 ml is over one billion-fold larger than the femtoliter volume of a cell. This means that any nucleation-limited kinetics of relevant amyloid formation are lost, and any alternative amyloid polymorphs that have a kinetic growth advantage -- even if they nucleate at only a fraction the rate of relevant amyloid -- will tend to dominate the system (Buell, 2017). Novel approaches are clearly needed to address these problems. We present such an approach, stretch it to the limit (as the reviewer notes) across multiple complementary experiments, and arrive at a novel finding that is fully and uniquely consistent with all of our own data as well as the collective prior literature.

      That the preceding considerations are collectively essential to understand relevant amyloid behavior is evident from recent cryoEM studies showing that in vitro-generated amyloid structures generally differ from those in patients (Arseni et al., 2022; Bansal et al., 2021; Radamaker et al., 2021; Schmidt et al., 2019; Schweighauser et al., 2020; Yang et al., 2022). This is highly relevant to the present discourse because each amyloid structure is thought to emanate from a different nucleating structure. This means that in vitro experiments have broadly missed the mark in terms of the relevant thermodynamic parameters that govern disease onset and progression. Note that the rules laid out via our studies are not only consistent with structural features of polyQ amyloid in cells, but also (as described in the discussion) explain why the endogenous structure of a physiologically relevant Q zipper amyloid differs from that of polyQ.

      A recent collaboration between the Morimoto and Knowles groups (Sinnige et al.) investigated the kinetics of aggregation by Q40-YFP expressed in C. elegans body wall muscle cells, using quantitative approaches that have been well established for in vitro amyloid-forming systems of the type favored by the reviewer. They calculate a reaction order of just 1.6, slightly higher than what would be expected for a monomeric nucleus but nevertheless fully consistent with our own conclusions when one accounts for the following two aspects of their approach. First, the polyQ tract in their construct is flanked by short poly-Histidine tracts on both sides. These charges very likely disfavor monomeric nucleation because all possible configurations of a four-stranded bundle position the beginning and end of the Q tract in close proximity, and Q40 is only just long enough to achieve monomeric nucleation in the absence of such destabilization. Second, the protein is fused to YFP, a weak homodimer (Landgraf et al., 2012; Snapp et al., 2003). With these two considerations, our model -- which was generated from polyQ tracts lacking flanking charges or an oligomeric fusion -- predicts that amyloid nucleation by their construct will occur more frequently as a dimer than a monomer. Indeed, their observed reaction order of 1.6 supports a predominantly dimeric nucleus. Like us and others, Sinnige et al. did not observe phase separation prior to amyloid formation. This is important because it not only argues against nucleation occurring in a condensate, it also suggests that the reaction order they calculated has not been limited by the concentration-buffering effect of phase separation.

      While we agree that our conclusions rest heavily on DAmFRET data (for good reason), we do provide supporting evidence from molecular dynamics simulations, SDD-AGE, and microscopy.

      To summarize, given the extreme limitations of in vitro experiments in this field, the breadth of our current study, and supporting findings from another lab using rigorous quantitative approaches, we feel that our claims are justified without in vitro data.

      Rebuttal to the perceived incompatibility of monomeric nucleation with the existence of a critical concentration for amyloid

      We appreciate that the concept of a monomeric nucleus can superficially appear inconsistent with the fact that crystalline solids such as polyQ amyloid have a saturating concentration, but this is only true if one neglects that polyQ amyloids are polymer crystals with intramolecular ordering. The perceived discrepancy is perhaps most easily dispelled by protein crystallography. Folded proteins form crystals. These crystals have critical concentrations, and the protein subunits within them each have intramolecular crystalline order (in the form of secondary structure). To extrapolate these familiar examples to our present finding with polyQ, one need only appreciate the now well-established phenomenon of secondary nucleation, whereby transient interactions of soluble species with the ordered species leads to their own ordering (Törnquist et al., 2018). Transience is important here because it implies that intramolecular ordering can in principle propagate even in solutions that are subsaturated with respect to bulk crystallization. This is possible in the present case because the pairing of sufficiently short beta strands (equivalent to “stems” in the polymer crystal literature) will be more stable intramolecularly than intermolecularly, due to the reduced entropic penalty of the former. Our elucidation that Q zipper ordering can occur with shorter strands intramolecularly than intermolecularly (Fig. S4C-D) demonstrates this fact. It is also evident from published descriptions of single molecule “crystals” formed in sufficiently dilute solutions of sufficiently long polymers (Hong et al., 2015; Keller, 1957; Lauritzen and Hoffman, 1960).

      In suggesting that a saturating concentration for amyloid rules out monomeric nucleation, the reviewer assumes that the Q zipper-containing monomer must be stable relative to the disordered ensemble. This is not inherent to our claim and in fact opposes the definition of a nucleus. The monomeric nucleating structure need not be more stable than the disordered state, and monomers may very well be disordered at equilibrium at low concentrations. To be clear, our claim requires that the Q zipper-containing monomer is both on pathway to amyloid and less stable than all subsequent species that are on pathway to amyloid. The former requirement is supported by our extensive mutational analysis. The latter requirement is supported by our atomistic simulations showing the Q zipper-containing monomer is stabilized by dimerization (see our 2021 preprint). Hence, requisite ordering in the nucleating monomer is stabilized by intermolecular interactions. We provide in Author response image 1 an illustration to clarify what we believe to be the discrepancy between our claim and the reviewer’s interpretation.

      Author response image 1.

      That the rate-limiting fluctuation for a crystalline phase can occur in a monomer can also be understood as a consequence of Ostwald’s rule of stages, which describes the general tendency of supersaturated solutes, including amyloid forming proteins (Chakraborty et al., 2023), to populate metastable phases en route to more stable phases (De Yoreo, 2022; Schmelzer and Abyzov, 2017). Our findings with polyQ are consistent with a general mechanism for Ostwald’s rule wherein the relative stabilities of competing polymorphs differ with the number of subunits (De Yoreo, 2022; Navrotsky, 2004). As illustrated in Fig. 6 of Navrotsky, a polymorph that is relatively stable at small particle sizes tends to give way to a polymorph that -- while initially unstable -- becomes more stable as the particles grow. The former is analogous to our early stage Q zipper composed of two short sheets with an intramolecular interface, while the latter is analogous to the later stage Q zipper composed of longer sheets with an intermolecular interface. Subunit addition stabilizes the latter more than the former, hence the initial Q zipper that is stabilized more by intra- than intermolecular interactions will mature with growth to one that is stabilized more by intermolecular interactions.

      We apologize to the Pappu group for neglecting to cite Crick et al. 2013 in the current preprint. Contrary to the reviewer’s assessment, however, we find that the conclusions of this valuable study do more to support than to refute our findings. Briefly, Crick et al. investigated the aggregation of synthetic Q30 and Q40 peptides in vitro, wherein fibrils assembled from high concentrations of peptide were demonstrated to have saturating concentrations in the low micromolar range. As explained above, this finding of a saturating concentration does not refute our results. More relevant to the present work are their findings that “oligomers” accumulated over an hours-long timespan in solutions that are subsaturated with respect to fibrils, and these oligomers themselves have (nanomolar) critical concentrations. The authors postulated that the oligomers result from liquid–liquid demixing of intrinsically disordered polyglutamine. However, phase separation by a peptide is expected to fix its concentration in both the solute and condensed phases, and, because disordered phase separation is inherently faster than amyloid formation, the postulated explanation removes the driving force for any amyloid phase with a critical solubility greater than that of the oligomers. In place of this interpretation that truly does appear to -- in the reviewer’s words -- “contradict basic physical principles of how homopolymers self-assemble”, we interpret these oligomers as evidence of our Q zipper-containing self-poisoned multimers, rounded as an inherent consequence of self-poisoning (Ungar et al., 2005), and likely akin to semicrystalline spherulites that have been observed in other polymer crystal and amyloid-forming systems (Crist and Schultz, 2016; Vetri and Foderà, 2015). That Crick et al. also observed the formation of a relatively labile amyloid phase when the reactions were started with 50 uM peptide is unsurprising in light of the aforementioned kinetic advantage that large reaction volumes can confer to labile polymorphs, and that high concentrations (in this case, orders of magnitude higher than the likely physiological concentration of polyQ (Wild et al., 2015)) can favor the formation of labile amyloid polymorphs (Ohhashi et al., 2010). Indeed, a contemporaneous study by the Wetzel group using very similar peptide constructs and polyQ lengths -- but beginning with lower concentrations -- found that the relevant saturating concentrations for amyloid lie below their limit of detection of 100 nM (Sahoo et al., 2014).

      Rebuttals to other critiques

      The reviewer states that we found nucleation potential to require 60 Qs in a row. Our data are collectively consistent with nucleation occurring at and above approximately 36 Qs, a point repeated in the paper. The reviewer may be referring to our statement, ”Sixty residues proved to be the optimum length to observe both the pre- and post-nucleated states of polyQ in single experiments”. The purpose of this statement is simply to describe the practical consideration that led us to use 60 Qs for the bulk of our assays. We do appreciate that the fraction of AmFRET-positive cells is very low for lengths just above the threshold, especially Q40. They are nevertheless highly significant (p = 0.004 in [PIN+] cells, one-tailed T-test), and we will modify the figure and text to clarify this.

      The reviewer characterizes self-poisoning as the hallmark of crystallization from polymer melts, which would be problematic for our conclusions if self-poisoning were limited to this non-physiological context. In fact the term was first used to describe crystallization from solution (Organ et al., 1989), wherein the phenomenon is more pronounced (Ungar et al., 2005).

      Reviewer #2 (Public Review):

      Numerous neurodegenerative diseases are thought to be driven by the aggregation of proteins into insoluble filaments known as "amyloids". Despite decades of research, the mechanism by which proteins convert from the soluble to insoluble state is poorly understood. In particular, the initial nucleation step is has proven especially elusive to both experiments and simulation. This is because the critical nucleus is thermodynamically unstable, and therefore, occurs too infrequently to directly observe. Furthermore, after nucleation much faster processes like growth and secondary nucleation dominate the kinetics, which makes it difficult to isolate the effects of the initial nucleation event. In this work Kandola et al. attempt to surmount these obstacles using individual yeast cells as microscopic reaction vessels. The large number of cells, and their small size, provides the statistics to separate the cells into pre- and post-nucleation populations, allowing them to obtain nucleation rates under physiological conditions. By systematically introducing mutations into the amyloid-forming polyglutamine core of huntingtin protein, they deduce the probable structure of the amyloid nucleus. This work shows that, despite the complexity of the cellular environment, the seemingly random effects of mutations can be understood with a relatively simple physical model. Furthermore, their model shows how amyloid nucleation and growth differ in significant ways, which provides testable hypotheses for probing how different steps in the aggregation pathway may lead to neurotoxicity.

      In this study Kandola et al. probe the nucleation barrier by observing a bimodal distribution of cells that contain aggregates; the cells containing aggregates have had a stochastic fluctuation allowing the proteins to surmount the barrier, while those without aggregates have yet to have a fluctuation of suitable size. The authors confirm this interpretation with the selective manipulation of the PIN gene, which provides an amyloid template that allows the system to skip the nucleation event.

      In simple systems lacking internal degrees of freedom (i.e., colloids or rigid molecules) the nucleation barrier comes from a significant entropic cost that comes from bringing molecules together. In large aggregates this entropic cost is balanced by attractive interactions between the particles, but small clusters are unable to form the extensive network of stabilizing contacts present in the larger aggregates. Therefore, the initial steps in nucleation incur an entropic cost without compensating attractive interactions (this imbalance can be described as a surface tension). When internal degrees of freedom are present, such as the conformational states of a polypeptide chain, there is an additional contribution to the barrier coming from the loss of conformational entropy required to the adopt aggregation-prone state(s). In such systems the clustering and conformational processes do not necessarily coincide, and a major challenge studying nucleation is to separate out these two contributions to the free energy barrier. Surprisingly, Kandola et al. find that the critical nucleus occurs within a single molecule. This means that the largest contribution to the barrier comes from the conformational entropy cost of adopting the beta-sheet state. Once this state is attained, additional molecules can be recruited with a much lower free energy barrier.

      There are several caveats that come with this result. First, the height of the nucleation barrier(s) comes from the relative strength of the entropic costs compared to the binding affinities. This balance determines how large a nascent nucleus must grow before it can form interactions comparable to a mature aggregate. In amyloid nuclei the first three beta strands form immature contacts consisting of either side chain or backbone contacts, whereas the fourth strand is the first that is able to form both kinds of contacts (as in a mature fibril). This study used relatively long polypeptides of 60 amino acids. This is greater than the 20-40 amino acids found in amyloid-forming molecules like ABeta or IAPP. As a result, Kandola et al.'s molecules are able to fold enough times to create four beta strands and generate mature contacts intramolecularly. The authors make the plausible claim that these intramolecular folds explain the well-known length threshold (L~35) observed in polyQ diseases. The intramolecular folds reduce the importance of clustering multiple molecules together and increase the importance of the conformational states. Similarly, manipulating the sequence or molecular concentrations will be expected to manipulate the relative magnitude of the binding affinities and the clustering entropy, which will shift the relative heights of the entropic barriers.

      The reviewer correctly notes that the majority of our manipulations were conducted with 60-residue long tracts (which corresponds to disease onset in early adulthood), and this length facilitates intramolecular nucleation. However, we also analyzed a length series of polyQ spanning the pathological threshold, as well as a synthetic sequence designed explicitly to test the model nucleus structure with a tract shorter than the pathological threshold, and both experiments corroborate our findings.

      The authors make an important point that the structure of the nucleus does not necessarily resemble that of the mature fibril. They find that the critical nucleus has a serpentine structure that is required by the need to form four beta strands to get the first mature contacts. However, this structure comes at a cost because residues in the hairpins cannot form strong backbone or zipper interactions. Mature fibrils offer a beta sheet template that allows incoming molecules to form mature contacts immediately. Thus, it is expected that the role of the serpentine nucleus is to template a more extended beta sheet structure that is found in mature fibrils.

      A second caveat of this work is the striking homogeneity of the nucleus structure they describe. This homogeneity is likely to be somewhat illusory. Homopolymers, like polyglutamine, have a discrete translational symmetry, which implies that the hairpins needed to form multiple beta sheets can occur at many places along the sequence. The asparagine residues introduced by the authors place limitations on where the hairpins can occur, and should be expected to increase structural homogeneity. Furthermore, the authors demonstrate that polyglutamine chains close to the minimum length of ~35 will have strict limitations on where the folds must occur in order to attain the required four beta strands.

      We are unsure how to interpret the above statements as a caveat. We agree that increasing sequence complexity will tend to increase homogeneity, but this is exactly the motivation of our approach. We explicitly set out to determine the minimal complexity sequence sufficient to specify the nucleating conformation, which we ultimately identified in terms of secondary and tertiary structure. We do not specify which parts of a long polyQ tract correspond to which parts of the structure, because, as the reviewer points out, they can occur at many places. Hence, depending on the length of the polyQ tract, the nucleus we describe may have any length of sequence connecting the strand elements. We do not think that the effects of N-residue placement can be interpreted as a confounding influence on hairpin position because the striking even-odd pattern we observe implicates the sides of beta strands rather than the lengths. Moreover, we observe this pattern regardless of the residue used (Gly, Ser, Ala, and His in addition to Asn).

      A novel result of this work is the observation of multiple concentration regimes in the nucleation rate. Specifically, they report a plateau-like regime at intermediate regimes in which the nucleation rate is insensitive to protein concentration. The authors attribute this effect to the "self-poisoning" phenomenon observed in growth of some crystals. This is a valid comparison because the homogeneity observed in NMR and crystallography structures of mature fibrils resemble a one-dimensional crystal. Furthermore, the typical elongation rate of amyloid fibrils (on the order of one molecule per second) is many orders of magnitude slower than the molecular collision rate (by factors of 10^6 or more), implying that the search for the beta-sheet state is very slow. This slow conformational search implies the presence of deep kinetic traps that would be prone to poisoning phenomena. However, the observation of poisoning in nucleation during nucleation is striking, particularly in consideration of the expected disorder and concentration sensitivity of the nucleus. Kandola et al.'s structural model of an ordered, intramolecular nucleus explains why the internal states responsible for poisoning are relevant in nucleation.

      We thank the reviewer for noting the novelty and plausibility of the self-poisoning connection. We would like to elaborate on our finding that self-poisoning inhibits nucleation (in addition to elongation), as this could prove confusing to some readers. While self-poisoning is claimed to inhibit primary nucleation in the polymer crystal literature (Ungar et al., 2005; Zhang et al., 2018), the semantics of “nucleation” in this context warrants clarification. Technically, the same structure can be considered a nucleus in one context but not in another. The Q zipper monomer, even if it is rate-limiting for amyloid formation at low concentrations (and is therefore the “nucleus”), is not necessarily rate-limiting when self-poisoned at high concentrations. Whether it comprises the nucleus in this case depends on the rates of Q zipper formation relative to subunit addition to the poisoned state. If the latter happens slower than Q zipper formation de novo, it can be said that self-poisoning inhibits nucleation, regardless of whether the Q zipper formed. We suspect this to be the mechanism by which preemptive oligomerization blocks nucleation in the case of polyQ, though other mechanisms may be possible.

      To achieve these results the authors used a novel approach involving a systematic series of simple sequences. This is significant because, while individual experiments showed seemingly random behavior, the randomness resolved into clear trends with the systematic approach. These trends provided clues to build a model and guide further experiments.

      Reviewer #3 (Public Review):

      Kandola et al. explore the important and difficult question regarding the initiating event that triggers (nucleates) amyloid fibril growth in glutamine-rich domains. The researchers use a fluorescence technique that they developed, dAMFRET, in a yeast system where they can manipulate the expression level over several orders of magnitude, and they can control the length of the polyglutamine domain as well as the insertion of interfering non-glutamine residues. Using flow cytometry, they can interrogate each of these yeast 'reactors' to test for self-assembly, as detected by FRET.

      In the introduction, the authors provide a fairly thorough yet succinct review of the relevant literature into the mechanisms of polyglutamine-mediated aggregation over the last two decades. The presentation as well as the illustrations in Figure 1A and 1B are difficult to understand, and unfortunately, there is no clear description of the experimental technique that would allow the reader to connect the hypothetical illustrations to the measurement outcomes. The authors do not explain what the FRET signal specifically indicates or what its intensity is correlated to. FRET measures distance between donor and acceptor, but can it be reliably taken as an indicator of a specific beta-sheet conformation and of amyloid? Does the signal increase with both nucleation and with elongation, and is the signal intensity the same if, e.g., there were 5 aggregates of 10 monomers each versus 50 monomeric nuclei? Is there a reason why the AmFRET signal intensity decreases at longer Q even though the number of cells with positive signal increases? Does the number of positive cells increase with time? The authors state later that 'non-amyloid containing cells lacked AmFRET altogether', but this seems to be a tautology - isn't the lack of AmFRET taken as a proof of lack of amyloid? Overall, a clearer description of the experimental method and what is actually measured (and validation of the quantitative interpretation of the FRET signal) would greatly assist the reader in understanding and interpreting the data.

      We believe the difficulty in understanding the illustrations in Figure 1A and 1B is inherent to the subject. We agree that elaborating how DAmFRET works would help the reader, and will add a few sentences to this end. Beyond this, we refer the reviewer and readers to our cited prior work describing the theory and interpretation of DAmFRET. Note that the y-axes of DAmFRET plots are not raw FRET but rather “AmFRET”, a ratio of FRET to total expression level. As explained thoroughly in our cited prior work, the discontinuity of AmFRET with expression level indicates that the high AmFRET-population formed via a disorder-to-order transition. When the query protein is predicted to be intrinsically disordered, the discontinuous transition to high AmFRET invariably (among hundreds of proteins tested in prior published and unpublished work) signifies amyloid formation as corroborated by SDD-AGE and tinctorial assays.

      When performed using standard flow cytometry as in the present study, every AmFRET measurement corresponds to a cell-wide average, and hence does not directly inform on the distribution of the protein between different stoichiometric species. As there is only one fluorophore per protein molecule, monomeric nuclei have no signal. DAmFRET can distinguish cells expressing monomers from stable dimers from higher order oligomers (see e.g. Venkatesan et al. 2019), and we are therefore quite confident that AmFRET values of zero correspond to cells in which a vast majority of the respective protein is not in homo-oligomeric species (i.e. is monomeric or in hetero-complexes with endogenous proteins). The exact value of AmFRET, even for species with the same stoichiometry, will depend both on the effect of their respective geometries on the proximity of mEos3.1 fluorophores, and on the fraction of protein molecules in the species. Hence, we only attempt to interpret the plateau values of AmFRET (where the fraction of protein in an assembled state approaches unity) as directly informing on structure, as we did in Fig. S3A.

      We believe that AmFRET decreases with longer polyQ because the mass fraction of fluorophore decreases in the aggregate, simply because the extra polypeptide takes up volume in the aggregate.

      Yes, the fraction of positive cells in a discontinuous DAmFRET plot does increase with time. However, given the more laborious data collection and derivation of nucleation kinetics in a system with ongoing translation, especially across hundreds of experiments with other variables, ours is a snapshot measurement to approximately derive the relative contributions of intra- and intermolecular fluctuations to the nucleation barrier, rather than the barrier’s magnitude.

      We will revise the tautological statement by removing “non-amyloid containing”.

      The authors demonstrate that their assay shows that the fraction of cells with AmFRET signal increases strongly with an increase in polyQ length, with a 'threshold around 50-60 glutamines. This roughly correlates with the Q-length dependence of disease. The experiments in which asparagine or other amino acids are inserted at variable positions in the glutamine repeat are creative and thorough, and the data along with the simulations provide compelling support for the proposed Q zipper model. The experiments shown in Figure 5 are strongly supportive of a model where formation of the beta-sheet nucleus is within a monomer. This is a potentially important result, as there are conflicting data in the literature as to whether the nucleus in polyQ is monomer.

      We thank the reviewer for these comments. We wish to clarify one important point, however, concerning the correlation of our data with the pathological length threshold. As we state in the first results section, “Our data recapitulated the pathologic threshold -- Q lengths 35 and shorter lacked AmFRET, indicating a failure to aggregate or even appreciably oligomerize, while Q lengths 40 and longer did acquire AmFRET in a length and concentration-dependent manner”. Hence, most of our experiments were conducted with 60Q not because it resembles the pathological threshold, but rather because it was most convenient for DAmFRET experiments.

      I did not find the argument, that their data shows the Q zipper grows in two dimensions, compelling; there are more direct experimental methods to answer this question. I was also confused by the section that Q zippers poison themselves. It would be easier for the reader to follow if the authors first presented their results without interpretation. The data seem more consistent with an argument that, at high concentrations, non-structured polyQ oligomers form which interfere with elongation into structured amyloid assemblies - but such oligomers would not be zippers.

      Self-poisoning is a widely observed and heavily studied phenomenon in polymer crystal physics, though it seems not yet to have entered the lexicon of amyloid biologists. We were new to this concept before it emerged as an extremely parsimonious explanation for our results. As described in the text, two pieces of evidence exclude the alternative mechanism suggested by the reviewer -- that non-structured oligomers form and subsequently engage and inhibit the template. Specifically, 1) inhibition occurs without any detectable FRET, even at high total protein concentration, indicating the species do not form in a concentration-dependent manner that would be expected of disordered oligomers; and 2) inhibition itself has strict sequence requirements that match those of Q zippers. Hence our data collectively suggest that inhibition is a consequence of the deposition of partially ordered molecules onto the templating surface.

      Although some speculation or hypothesizing is perfectly appropriate in the discussion, overall the authors stretch this beyond what can be supported by the results. A couple of examples: The conclusion that toxicity arises from 'self-poisoned polymer crystals' is not warranted, as there is no relevant data presented in this manuscript. The authors refer to findings 'that kinetically arrested aggregates emerge from the same nucleating event responsible for amyloid formation', but I cannot recall any evidence for this statement in the results section.

      We restricted any mention of toxicity to the introduction and a section in the discussion that is not worded as conclusive. Nevertheless, we will soften the subheading and text of the relevant section in the discussion to more clearly indicate the speculative nature of the statements.

      We stand by our statement 'that kinetically arrested aggregates emerge from the same nucleating event responsible for amyloid formation', as this follows directly from self-poisoning.

      Bibliography

      Arseni D, Hasegawa M, Murzin AG, Kametani F, Arai M, Yoshida M, Ryskeldi-Falcon B. 2022. Structure of pathological TDP-43 filaments from ALS with FTLD. Nature 601:139–143. doi:10.1038/s41586-021-04199-3

      Bansal A, Schmidt M, Rennegarbe M, Haupt C, Liberta F, Stecher S, Puscalau-Girtu I, Biedermann A, Fändrich M. 2021. AA amyloid fibrils from diseased tissue are structurally different from in vitro formed SAA fibrils. Nat Commun 12:1013. doi:10.1038/s41467-021-21129-z

      Buell AK. 2017. The Nucleation of Protein Aggregates - From Crystals to Amyloid Fibrils. Int Rev Cell Mol Biol 329:187–226. doi:10.1016/bs.ircmb.2016.08.014

      Chakraborty D, Straub JE, Thirumalai D. 2023. Energy landscapes of Aβ monomers are sculpted in accordance with Ostwald’s rule of stages. Sci Adv 9:eadd6921. doi:10.1126/sciadv.add6921 Crist B, Schultz JM. 2016. Polymer spherulites: A critical review. Prog Polym Sci 56:1–63. doi:10.1016/j.progpolymsci.2015.11.006

      De Yoreo JJ. 2022. Casting a bright light on Ostwald’s rule of stages. Proc Natl Acad Sci USA 119. doi:10.1073/pnas.2121661119

      Hong Y, Yuan S, Li Z, Ke Y, Nozaki K, Miyoshi T. 2015. Three-Dimensional Conformation of Folded Polymers in Single Crystals. Phys Rev Lett 115:168301. doi:10.1103/PhysRevLett.115.168301

      Keller A. 1957. A note on single crystals in polymers: Evidence for a folded chain configuration. Philosophical Magazine 2:1171–1175. doi:10.1080/14786435708242746

      Landgraf D, Okumus B, Chien P, Baker TA, Paulsson J. 2012. Segregation of molecules at cell division reveals native protein localization. Nat Methods 9:480–482. doi:10.1038/nmeth.1955

      Lauritzen JI, Hoffman JD. 1960. Theory of Formation of Polymer Crystals with Folded Chains in Dilute Solution. J Res Natl Bur Stand A Phys Chem 64A:73–102. doi:10.6028/jres.064A.007

      Navrotsky A. 2004. Energetic clues to pathways to biomineralization: precursors, clusters, and nanoparticles. Proc Natl Acad Sci USA 101:12096–12101. doi:10.1073/pnas.0404778101

      Ohhashi Y, Ito K, Toyama BH, Weissman JS, Tanaka M. 2010. Differences in prion strain conformations result from non-native interactions in a nucleus. Nat Chem Biol 6:225–230. doi:10.1038/nchembio.306

      Organ SJ, Ungar G, Keller A. 1989. Rate minimum in solution crystallization of long paraffins. Macromolecules 22:1995–2000. doi:10.1021/ma00194a078

      Radamaker L, Baur J, Huhn S, Haupt C, Hegenbart U, Schönland S, Bansal A, Schmidt M, Fändrich M. 2021. Cryo-EM reveals structural breaks in a patient-derived amyloid fibril from systemic AL amyloidosis. Nat Commun 12:875. doi:10.1038/s41467-021-21126-2

      Sahoo B, Singer D, Kodali R, Zuchner T, Wetzel R. 2014. Aggregation behavior of chemically synthesized, full-length huntingtin exon1. Biochemistry 53:3897–3907. doi:10.1021/bi500300c

      Schmelzer JWP, Abyzov AS. 2017. How do crystals nucleate and grow: ostwald’s rule of stages and beyond In: Šesták J, Hubík P, Mareš JJ, editors. Thermal Physics and Thermal Analysis, Hot Topics in Thermal Analysis and Calorimetry. Cham: Springer International Publishing. pp. 195–211. doi:10.1007/978-3-319-45899-1_9

      Schmidt M, Wiese S, Adak V, Engler J, Agarwal S, Fritz G, Westermark P, Zacharias M, Fändrich M. 2019. Cryo-EM structure of a transthyretin-derived amyloid fibril from a patient with hereditary ATTR amyloidosis. Nat Commun 10:5008. doi:10.1038/s41467-019-13038-z

      Schweighauser M, Shi Y, Tarutani A, Kametani F, Murzin AG, Ghetti B, Matsubara T, Tomita T, Ando T, Hasegawa K, Murayama S, Yoshida M, Hasegawa M, Scheres SHW, Goedert M. 2020. Structures of α-synuclein filaments from multiple system atrophy. Nature 585:464–469. doi:10.1038/s41586-020-2317-6

      Snapp EL, Hegde RS, Francolini M, Lombardo F, Colombo S, Pedrazzini E, Borgese N, Lippincott-Schwartz J. 2003. Formation of stacked ER cisternae by low affinity protein interactions. J Cell Biol 163:257–269. doi:10.1083/jcb.200306020

      Törnquist M, Michaels TCT, Sanagavarapu K, Yang X, Meisl G, Cohen SIA, Knowles TPJ, Linse S. 2018. Secondary nucleation in amyloid formation. Chem Commun 54:8667–8684. doi:10.1039/c8cc02204f

      Ungar G, Putra EGR, de Silva DSM, Shcherbina MA, Waddon AJ. 2005. The Effect of Self-Poisoning on Crystal Morphology and Growth Rates In: Allegra G, editor. Interphases and Mesophases in Polymer Crystallization I, Advances in Polymer Science. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 45–87. doi:10.1007/b107232

      Vetri V, Foderà V. 2015. The route to protein aggregate superstructures: Particulates and amyloid-like spherulites. FEBS Lett 589:2448–2463. doi:10.1016/j.febslet.2015.07.006

      Wild EJ, Boggio R, Langbehn D, Robertson N, Haider S, Miller JRC, Zetterberg H, Leavitt BR, Kuhn R, Tabrizi SJ, Macdonald D, Weiss A. 2015. Quantification of mutant huntingtin protein in cerebrospinal fluid from Huntington’s disease patients. The Journal of Clinical Investigation.

      Yang Y, Arseni D, Zhang W, Huang M, Lövestam S, Schweighauser M, Kotecha A, Murzin AG, Peak-Chew SY, Macdonald J, Lavenir I, Garringer HJ, Gelpi E, Newell KL, Kovacs GG, Vidal R, Ghetti B, Ryskeldi-Falcon B, Scheres SHW, Goedert M. 2022. Cryo-EM structures of amyloid-β 42 filaments from human brains. Science 375:167–172. doi:10.1126/science.abm7285

      Zhang X, Zhang W, Wagener KB, Boz E, Alamo RG. 2018. Effect of Self-Poisoning on Crystallization Kinetics of Dimorphic Precision Polyethylenes with Bromine. Macromolecules 51:1386–1397. doi:10.1021/acs.macromol.7b02745

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors investigated the effect of chronic activation of dopamine neurons using chemogenetics. Using Gq-DREADDs, the authors chronically activated midbrain dopamine neurons and observed that these neurons, particularly their axons, exhibit increased vulnerability and degeneration, resembling the pathological symptoms of Parkinson's disease. Baseline calcium levels in midbrain dopamine neurons were also significantly elevated following the chronic activation. Lastly, to identify cellular and circuit-level changes in response to dopaminergic neuronal degeneration caused by chronic activation, the authors employed spatial genomics (Visium) and revealed comprehensive changes in gene expression in the mouse model subjected to chronic activation. In conclusion, this study presents novel data on the consequences of chronic hyperactivation of midbrain dopamine neurons.

      Strengths:

      This study provides direct evidence that the chronic activation of dopamine neurons is toxic and gives rise to neurodegeneration. In addition, the authors achieved the chronic activation of dopamine neurons using water application of clozapine-N-oxide (CNO), a method not commonly employed by researchers. This approach may offer new insights into pathophysiological alterations of dopamine neurons in Parkinson's disease. The authors also utilized state-of-the-art spatial gene expression analysis, which can provide valuable information for other researchers studying dopamine neurons. Although the authors did not elucidate the mechanisms underlying dopaminergic neuronal and axonal death, they presented a substantial number of intriguing ideas in their discussion, which are worth further investigation.

      We thank the reviewer for these positive comments.

      Weaknesses:

      Many claims raised in this paper are only partially supported by the experimental results. So, additional data are necessary to strengthen the claims. The effects of chronic activation of dopamine neurons are intriguing; however, this paper does not go beyond reporting phenomena. It lacks a comprehensive explanation for the degeneration of dopamine neurons and their axons. While the authors proposed possible mechanisms for the degeneration in their discussion, such as differentially expressed genes, these remain experimentally unexplored.

      We thank the reviewer for this review. We do believe that the manuscript has a mechanistic component, as the central experiments involve direct manipulation of neuronal activity, and we show an increase in calcium levels and gene expression changes in dopamine neurons that coincide with the degeneration. However, we agree that deeper mechanistic investigation would strengthen the conclusions of the paper. We have planned several important revisions, including the addition of CNO behavioral controls, manipulation of intracellular calcium using isradipine, additional transcriptomics experiments and further validation of findings. We anticipate that these additions will significantly bolster the conclusions of the paper.

      Reviewer #2 (Public Review):

      Summary:

      Rademacher et al. present a paper showing that chronic chemogenetic excitation of dopaminergic neurons in the mouse midbrain results in differential degeneration of axons and somas across distinct regions (SNc vs VTA). These findings are important. This mouse model also has the advantage of showing a axon-first degeneration over an experimentally-useful time course (2-4 weeks). 2. The findings that direct excitation of dopaminergic neurons causes differential degeneration sheds light on the mechanisms of dopaminergic neuron selective vulnerability. The evidence that activation of dopaminergic neurons causes degeneration and alters mRNA expression is convincing, as the authors use both vehicle and CNO control groups, but the evidence that chronic dopaminergic activation alters circadian rhythm and motor behavior is incomplete as the authors did not run a CNO-control condition in these experiments.

      Strengths:

      This is an exciting and important paper.

      The paper compares mouse transcriptomics with human patient data.

      It shows that selective degeneration can occur across the midbrain dopaminergic neurons even in the absence of a genetic, prion, or toxin neurodegeneration mechanism.

      We thank the reviewer for these insightful comments.

      Weaknesses:

      Major concerns:

      (1) The lack of a CNO-positive, DREADD-negative control group in the behavioral experiments is the main limitation in interpreting the behavioral data. Without knowing whether CNO on its own has an impact on circadian rhythm or motor activity, the certainty that dopaminergic hyperactivity is causing these effects is lacking.

      This is an important point. Although we show that CNO does not produce degeneration of DA neuron terminals, we do not exclude a contribution to the behavioral changes. We agree that this behavioral control is necessary, and will address it in revision with a CNO-only running wheel cohort.

      (2) One of the most exciting things about this paper is that the SNc degenerates more strongly than the VTA when both regions are, in theory, excited to the same extent. However, it is not perfectly clear that both regions respond to CNO to the same extent. The electrophysiological data showing CNO responsiveness is only conducted in the SNc. If the VTA response is significantly reduced vs the SNc response, then the selectivity of the SNc degeneration could just be because the SNc was more hyperactive than the VTA. Electrophysiology experiments comparing the VTA and SNc response to CNO could support the idea that the SNc has substantial intrinsic vulnerability factors compared to the VTA.

      We agree that additional electrophysiology conducted in the VTA dopamine neurons would meaningfully add to our understanding of the selective vulnerability in this model, and will complete these experiments in revision.

      (3) The mice have access to a running wheel for the circadian rhythm experiments. Running has been shown to alter the dopaminergic system (Bastioli et al., 2022) and so the authors should clarify whether the histology, electrophysiology, fiber photometry, and transcriptomics data are conducted on mice that have been running or sedentary.

      We will explicitly clarify which mice had access to a running wheel in our revision. Briefly, mice for histology, electrophysiology, and transcriptomics all had access to a running wheel during their treatment. The mice used for photometry underwent about 7 days of running wheel access approximately 3 weeks prior to the beginning of the experiment. The photometry headcaps sterically prevented mice from having access to a running wheel in their home cage.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Rademacher and colleagues examined the effect on the integrity of the dopamine system in mice of chronically stimulating dopamine neurons using a chemogenetic approach. They find that one to two weeks of constant exposure to the chemogenetic activator CNO leads to a decrease in the density of tyrosine hydroxylase staining in striatal brain sections and to a small reduction of the global population of tyrosine hydroxylase positive neurons in the ventral midbrain. They also report alterations in gene expression in both regions using a spatial transcriptomics approach. Globally, the work is well done and valuable and some of the conclusions are interesting. However, the conceptual advance is perhaps a bit limited in the sense that there is extensive previous work in the literature showing that excessive depolarization of multiple types of neurons associated with intracellular calcium elevations promotes neuronal degeneration. The present work adds to this by showing evidence of a similar phenomenon in dopamine neurons.

      We thank the reviewer for the careful and thoughtful review of our manuscript.

      While extensive depolarization and associated intracellular calcium elevations promotes degeneration generally, we emphasize that the process we describe is novel. Indeed, prior studies delivering chronic DREADDs to vulnerable neurons in models of Alzheimer’s disease did not report an increase in neurodegeneration, despite seeing changes in protein aggregation (e.g. Yuan and Grutzendler, J Neurosci 2016, PMID: 26758850; Hussaini et al., PLOS Bio 2020, PMID: 32822389). Further, a critical finding from our study is that in our paradigm, this stressor does not impact all dopamine neurons equally, as the SNc DA neurons are more vulnerable than the VTA, mirroring selective vulnerability characteristic of Parkinson’s disease. This is consistent with a large body of literature that SNc dopamine neurons are less capable of handling large energetic and calcium loads compared to neighboring VTA neurons, and the finding that chronically altered activity is sufficient to drive this preferential loss is novel.

      In addition, we are not aware of prior studies that have chronically activated DREADDs to produce neurodegeneration. Other studies have shown that acute excitotoxic stressors can produce neuronal degeneration, but the chronic increase in activity is central to our approach.

      In terms of the mechanisms explaining the neuronal loss observed after 2 to 4 weeks of chemogenetic activation, it would be important to consider that dopamine neurons are known from a lot of previous literature to undergo a decrease in firing through a depolarization-block mechanism when chronically depolarized. Is it possible that such a phenomenon explains much of the results observed in the present study? It would be important to consider this in the manuscript.

      As discussed in greater detail in the results section below, our data suggests this may not be a prominent feature in our model. However, we cannot rule out a contribution of depolarization block, and will expand on the discussion of this possibility in the revised manuscript.

      The relevance to Parkinson's disease (PD) is also not totally clear because there is not a lot of previous solid evidence showing that the firing of dopamine neurons is increased in PD, either in human subjects or in mouse models of the disease. As such, it is not clear if the present work is really modelling something that could happen in PD in humans.

      We completely agree that evidence of increased dopamine neuron activity from human PD patients is lacking and the existing data are difficult to interpret without human controls. However, as we outline in the manuscript, multiple lines of evidence suggest that the activity level of dopamine neurons almost certainly does change in PD. Therefore, it is very important that we understand how changes in the level of neural activity influence the degeneration of DA neurons. In this paper we examine the impact of increased activity. Increased activity may be compensatory after initial dopamine neuron loss, or may be an initial driver of death (Rademacher & Nakamura, Exp Neurol 2024, PMID: 38092187). Beyond what is already discussed in the manuscript, additional support for increased activity in PD models include:

      - Elevated firing rates in asymptomatic MitoPark mice (Good et al., FASEB J 2011, PMID: 21233488)

      - Increased frequency of spontaneous firing in patient-derived iPSC dopamine neurons and primary mouse dopamine neurons that overexpress synuclein (Lin et al., Acta Neuropath Comm 2021, PMID: 34099060)

      - Increased spontaneous firing in dopamine neurons of rats injected with synuclein preformed fibrils compared to sham (Tozzi et al., Brain 2021, PMID: 34297092)

      We will include and further discuss these important examples in our revision.

      Similarly, in future studies, it will also be important to study the impact of decreasing DA neuron activity. There will be additional levels of complexity to accurately model changes in PD, which may differ between subtypes of the disease, the disease stage, and the subtype of dopamine neuron. Our study models the possibility of chronically increased pacemaking, and interpretation of our results will be informed as we learn more about how the activity of DA neurons changes in humans in PD. We will discuss and elaborate on these important points in the revision.

      Comments on the introduction:

      The introduction cites a 1990 paper from the lab of Anthony Grace as support of the fact that DA neurons increase their firing rate in PD models. However, in this 1990 paper, the authors stated that: "With respect to DA cell activity, depletions of up to 96% of striatal DA did not result in substantial alterations in the proportion of DA neurons active, their mean firing rate, or their firing pattern. Increases in these parameters only occurred when striatal DA depletions exceeded 96%." Such results argue that an increase in firing rate is most likely to be a consequence of the almost complete loss of dopamine neurons rather than an initial driver of neuronal loss. The present introduction would thus benefit from being revised to clarify the overriding hypothesis and rationale in relation to PD and better represent the findings of the paper by Hollerman and Grace.

      We agree that the findings of Hollerman and Grace support compensatory changes in dopamine neuron activity in response to loss of dopamine neurons, rather than informing whether dopamine neuron loss can also be an initial driver of activity. We will clarify this point in our revision. In addition, the results of other studies on this point are mixed: a 50% reduction in dopamine neurons didn’t alter firing rate or bursting (Harden and Grace, J Neurosci 1995, PMID: 7666198; Bilbao et al, Brain Res 2006, PMID: 16574080), while a 40% loss was found to increase firing rate and bursting (Chen et al, Brain Res 2009. PMID: 19545547) and larger reductions alter burst firing (Hollerman & Grace, Brain Res 1990, PMID: 2126975; Stachowiak et al, J Neurosci 1987, PMID: 3110381). Importantly, even if compensatory, such late-stage increases in dopamine neuron activity may contribute to disease progression and drive a vicious cycle of degeneration in surviving neurons. In addition, we also don’t know how the threshold of dopamine neuron loss and altered activity may differ between mice and humans, and PD patients do not present with clinical symptoms until ~30-60% of nigral neurons are lost (Burke & O’Malley, Exp Neurol 2013, PMID: 22285449; Shulman et al, Annu Rev Pathol 2011, PMID: 21034221).

      Other lines of evidence support the potential role of hyperactivity in disease initiation, including increased activity before dopamine neuron loss in MitoPark mice (Good et al., FASEB J 2011, PMID: 21233488), increased spontaneous firing in patient-derived iPSC dopamine neurons (Lin et al., Acta Neuropath Comm 2021, PMID: 34099060), and increased activity observed in genetic models of PD (Bishop et al., J Neurophysiol 2010, PMID: 20926611; Regoni et al., Cell Death Dis 2020,  PMID: 33173027).

      It would be good that the introduction refers to some of the literature on the links between excessive neuronal activity, calcium, and neurodegeneration. There is a large literature on this and referring to it would help frame the work and its novelty in a broader context.

      We agree that a discussion of hyperactivity, calcium, and neurodegeneration would benefit the introduction. While we briefly discuss calcium and neurodegeneration in the discussion, we will expand on this literature in both the introduction and discussion sections. We will carefully review and contextualize our work within existing frameworks of calcium and neurodegeneration (e.g. Surmeier & Schumacker, J Biol Chem 2013, PMID: 23086948; Verma et al., Transl Neurodegener 2022, PMID: 35078537). We believe that the novelty of our study lies in 1) a chronic chemogenetic activation paradigm via drinking water, 2) demonstrating selective vulnerability of dopamine neurons as a result of altering their activity/excitability alone, and 3) comparing mouse and human spatial transcriptomics.

      Comments on the results section:

      The running wheel results of Figure 1 suggest that the CNO treatment caused a brief increase in running on the first day after which there was a strong decrease during the subsequent days in the active phase. This observation is also in line with the appearance of a depolarization block.

      The authors examined many basic electrophysiological parameters of recorded dopamine neurons in acute brain slices. However, it is surprising that they did not report the resting membrane potential, or the input resistance. It would be important that this be added because these two parameters provide key information on the basal excitability of the recorded neurons. They would also allow us to obtain insight into the possibility that the neurons are chronically depolarized and thus in depolarization block.

      We do report the input resistance in Supplemental Figure 1C, which was unchanged in CNO-treated animals compared to controls. We did not report the resting membrane potential because many of the DA neurons were spontaneously firing. However, we will report the initial membrane potential on first breaking into the cell for the whole cell recordings in the revision, which did not vary between groups. This is still influenced by action potential activity, but is the timepoint in the recording least impacted by dialyzing of the neuron by the internal solution. We observed increased spontaneous action potential activity ex vivo in slices from CNO-treated mice (Figure 1D), thus at least under these conditions these dopamine neurons are not in depolarization block. We also did not see strong evidence of changes in other intrinsic properties of the neurons with whole cell recordings (e.g. Figure S1C). Overall, our electrophysiology experiments are not consistent with the depolarization block model, at least not due to changes in the intrinsic properties of the neurons. Although our ex vivo findings cannot exclude a contribution of depolarization block in vivo, we do show that CNO-treated mice removed from their cages for open field testing continue to have a strong trend for increased activity for approximately 10 days (S1E).  This finding is also consistent with increased activity of the DA neurons. We will add discussion of these important considerations in the revision.

      It is great that the authors quantified not only TH levels but also the levels of mCherry, co-expressed with the chemogenetic receptor. This could in principle help to distinguish between TH downregulation and true loss of dopamine neuron cell bodies. However, the approach used here has a major caveat in that the number of mCherry-positive dopamine neurons depends on the proportion of dopamine neurons that were infected and expressed the DREADD and this could very well vary between different mice. It is very unlikely that the virus injection allowed to infect 100% of the neurons in the VTA and SNc. This could for example explain in part the mismatch between the number of VTA dopamine neurons counted in panel 2G when comparing TH and mCherry counts. Also, I see that the mCherry counts were not provided at the 2-week time point. If the mCherry had been expressed genetically by crossing the DAT-Cre mice with a floxed fluorescent reported mice, the interpretation would have been simpler. In this context, I am not convinced of the benefit of the mCherry quantifications. The authors should consider either removing these results from the final manuscript or discussing this important limitation.

      We thank the reviewer for this insightful comment, and we agree that this is a caveat of our mCherry quantification. Quantitation of the number of mCherry+ DA neurons specifically informs the impact on transduced DA neurons, and mCherry appears to be less susceptible to downregulation versus TH. As the reviewer points out, it carries the caveat that there is some variability between injections. Nonetheless, we believe that it conveys useful complementary data. As suggested, we will discuss this caveat in our revision. Note that mCherry was not quantified at the two-week timepoint because there is no loss of TH+ cells at that time.

      Although the authors conclude that there is a global decrease in the number of dopamine neurons after 4 weeks of CNO treatment, the post-hoc tests failed to confirm that the decrease in dopamine number was significant in the SNc, the region most relevant to Parkinson's. This could be due to the fact that only a small number of mice were tested. A "n" of just 4 or 5 mice is very small for a stereological counting experiment. As such, this experiment was clearly underpowered at the statistical level. Also, the choice of the image used to illustrate this in panel 2G should be reconsidered: the image suggests that a very large loss of dopamine neurons occurred in the SNc and this is not what the numbers show. A more representative image should be used.

      We agree that the stereology experiments were performed on relatively small numbers of animals. Combined with the small effect size, this may have contributed to the post-hoc tests showing a trend of p=0.1 for both the TH and mCherry dopamine cell counts in the SN at 4 weeks. As part of the planned experiments for our revision, we will perform an additional stereologic analysis to further assess the loss of SNc dopamine neurons. We will also review and ensure the images are representative.

      In Figure 3, the authors attempt to compare intracellular calcium levels in dopamine neurons using GCaMP6 fluorescence. Because this calcium indicator is not quantitative (unlike ratiometric sensors such as Fura2), it is usually used to quantify relative changes in intracellular calcium. The present use of this probe to compare absolute values is unusual and the validity of this approach is unclear. This limitation needs to be discussed. The authors also need to refer in the text to the difference between panels D and E of this figure. It is surprising that the fluctuations in calcium levels were not quantified. I guess the hypothesis was that there should be more or larger fluctuations in the mice treated with CNO if the CNO treatment led to increased firing. This needs to be clarified.

      We thank the reviewer for this comment. We understand that this method of comparing absolute values is unconventional. However, these animals were tested concurrently on the same system, and a clear effect on the absolute baseline was observed. We will include a caveat of this in our discussion. Panel D of this figure shows the raw, uncorrected photometry traces, whereas panel E shows the isosbestic corrected traces for the same recording. In panel E, the traces follow time in ascending order. We will also include frequency and amplitude data for these recordings.   

      Although the spatial transcriptomic results are intriguing and certainly a great way to start thinking about how the CNO treatment could lead to the loss of dopamine neurons, the presented results, the focusing of some broad classes of differentially expressed genes and on some specific examples, do not really suggest any clear mechanism of neurodegeneration. It would perhaps be useful for the authors to use the obtained data to validate that a state of chronic depolarization was indeed induced by the chronic CNO treatment. Were genes classically linked to increased activity like cfos or bdnf elevated in the SNc or VTA dopamine neurons? In the striatum, the authors report that the levels of DARP32, a gene whose levels are linked to dopamine levels, are unchanged. Does this mean that there were no major changes in dopamine levels in the striatum of these mice?

      We will review the expression of activity-related genes in our dataset, although we must keep in mind that these genes may behave differently in the context of chronic activation as opposed to acutely increased activity. We will also include experiments assessing striatal dopamine levels by HPLC in the revision.

      The usefulness of comparing the transcriptome of human PD SNc or VTA sections to that of the present mouse model should be better explained. In the human tissues, the transcriptome reflects the state of the tissue many years after extensive loss of dopamine neurons. It is expected that there will be few if any SNc neurons left in such sections. In comparison, the mice after 7 days of CNO treatment do not appear to have lost any dopamine neurons. As such, how can the two extremely different conditions be reasonably compared?

      Our mouse model and human PD progress over distinct timescales, as is the case with essentially all mouse models of neurodegenerative diseases. Nonetheless, in our view there is still great value in comparing gene expression changes in mouse models with those in human disease. It seems very likely that the same pathologic processes that drive degeneration early in the disease continue to drive degeneration later in the disease. Note that we have tried to address the discrepancy in time scales in part by comparing to early PD samples when there is more limited SNc DA neuron loss. Please note the numbers of DA neurons within the areas we have selected for sampling (Figure at right). Therefore, we can indeed use spatial transcriptomics to compare dopamine neurons from mice with initial degeneration and patients where degeneration is ongoing during their disease.

      Author response image 1.

      Violin plot of DA neuron proportions sampled within the vulnerable SNV (deconvoluted RCTD method used in unmasked tissue sections of the SNV). Control and early PD subjects.

      Comments on the discussion:

      In the discussion, the authors state that their calcium photometry results support a central role of calcium in activity-induced neurodegeneration. This conclusion, although plausible because of the very broad pre-existing literature linking calcium elevation (such as in excitotoxicity) to neuronal loss, should be toned down a bit as no causal relationship was established in the experiments that were carried out in the present study.

      Our model utilizes hM3Dq-DREADDs that function by increasing intracellular calcium to increase neuronal excitability, and our results show increased Ca2+ by fiber photometry and changes to Ca2+-related genes, strongly suggesting a causal relation and crucial role of calcium in the mechanism of degeneration. However, we agree that we have not experimentally proven this point, as we acknowledged in the text. Additionally, we have planned revision experiments involving chronic isradipine treatment to further test the role of calcium in the mechanism of degeneration in this model.

      In the discussion, the authors discuss some of the parallel changes in gene expression detected in the mouse model and in the human tissues. Because few if any dopamine neurons are expected to remain in the SNc of the human tissues used, this sort of comparison has important conceptual limitations and these need to be clearly addressed.

      As discussed, we can sample SN DA neurons in early PD (see figure above), and in our view there is great value for such comparisons. We agree that discussion of appropriate caveats is warranted and this will be clearly addressed in the revision.

      A major limitation of the present discussion is that it does not discuss the possibility that the observed phenotypes are caused by the induction of a chronic state of depolarization block by the chronic CNO treatment. I encourage the authors to consider and discuss this hypothesis.

      As discussed above, our analyses of DA neuron firing in slices and open field testing to date do not support a prominent contribution of depolarization block with chronic CNO treatment. However, we cannot rule out this hypothesis, therefore we will include additional electrophysiology experiments and add discussion of this important consideration.  

      Also, the authors need to discuss the fact that previous work was only able to detect an increase in the firing rate of dopamine neurons after more than 95% loss of dopamine neurons. As such, the authors need to clearly discuss the relevance of the present model to PD. Are changes in firing rate a driver of neuronal loss in PD, as the authors try to make the case here, or are such changes only a secondary consequence of extensive neuronal loss (for example because a major loss of dopamine would lead to reduced D2 autoreceptor activation in the remaining neurons, and to reduced autoreceptor-mediated negative feedback on firing). This needs to be discussed.

      As discussed above, while increases in dopamine neuron activity may be compensatory after loss of neurons, the precise percentage required to induce such compensatory changes is not defined in mice and varies between paradigms, and the threshold level is not known in humans. We also reiterate that a compensatory increase in activity could still promote the degeneration of critical surviving DA neurons, whose loss underlies the substantial decline in motor function that typically occurs over the course of PD. Moreover, there are also multiple lines of evidence to suggest that changes in activity can initiate and drive dopamine neuron degeneration (Rademacher & Nakamura, Exp Neurol 2024). For example, overexpression of synuclein can increase firing in cultured dopamine neurons (Dagra et al., NPJ Parkinsons Dis 2021, PMID: 34408150) while mice expressing mutant Parkin have higher mean firing rates (Regoni et al., Cell Death Dis 2020,  PMID: 33173027). Similarly, an increased firing rate has been reported in the MitoPark mouse model of PD at a time preceding DA neuron degeneration (Good et al., FASEB J 2011, PMID: 21233488). We also acknowledge that alterations to dopamine neuron activity are likely complex in PD, and that dopamine neuron health and function can be impacted not just by simple increases in activity, but also by changes in activity patterns and regularity. We will amend our discussion to include the important caveat of changes in activity occurring as compensation, as well as further evidence of changes in activity preceding dopamine neuron death.

      There is a very large, multi-decade literature on calcium elevation and its effects on neuronal loss in many different types of neurons. The authors should discuss their findings in this context and refer to some of this previous work. In a nutshell, the observations of the present manuscript could be summarized by stating that the chronic membrane depolarization induced by the CNO treatment is likely to induce a chronic elevation of intracellular calcium and this is then likely to activate some of the well-known calcium-dependent cell death mechanisms. Whether such cell death is linked in any way to PD is not really demonstrated by the present results. The authors are encouraged to perform a thorough revision of the discussion to address all of these issues, discuss the major limitations of the present model, and refer to the broad pre-existing literature linking membrane depolarization, calcium, and neuronal loss in many neuronal cell types.

      While our model demonstrates classic excitotoxic cell death pathways, we would like to emphasize both the chronic nature of our manipulation and the progressive changes observed, with increasing degeneration seen at 1, 2, and 4 weeks of hyperactivity in an axon-first manner. This is a unique aspect of our study, in contrast to much of the previous literature which has focused on shorter timescales. Thus, while we will revise the discussion to more comprehensively acknowledge previous studies of calcium-dependent neuron cell death, we believe we have made several new contributions that are not predicted by existing literature. We have shown that this chronic manipulation is specifically toxic to nigral dopamine neurons, and the data that VTA dopamine neurons continue to be resilient even at 4 weeks is interesting and disease-relevant. We therefore do not want to use findings from other neuron types to draw assumptions about DA neurons, which are a unique and very diverse population. We acknowledge that as with all preclinical models of PD, we cannot draw definitive conclusions about PD with this data. However, we reiterate that we strongly believe that drawing connections to human disease is important, as dopamine neuron activity is very likely altered in PD and a clearer understanding of how dopamine neuron survival is impacted by activity will provide insight into the mechanisms of PD.

    1. Author response:

      Reviewer #1 (Public review):

      From the Reviewing Editor:

      Four reviewers have assessed your manuscript on valence and salience signaling in the central amygdala. There was universal agreement that the question being asked by the experiment is important. There was consensus that the neural population being examined (GABA neurons) was important and the circular shift method for identifying task-responsive neurons was rigorous. Indeed, observing valenced outcome signaling in GABA neurons would considerably increase the role the central amygdala in valence. However, each reviewer brought up significant concerns about the design, analysis and interpretation of the results. Overall, these concerns limit the conclusions that can be drawn from the results. Addressing the concerns (described below) would work towards better answering the question at the outset of the experiment: how does the central amygdala represent salience vs valence.

      A weakness noted by all reviewers was the use of the terms 'valence' and 'salience' as well as the experimental design used to reveal these signals. The two outcomes used emphasized non-overlapping sensory modalities and produced unrelated behavioral responses. Within each modality there are no manipulations that would scale either the value of the valenced outcomes or the intensity of the salient outcomes. While the food outcomes were presented many times (20 times per session over 10 sessions of appetitive conditioning) the shock outcomes were presented many fewer times (10 times in a single session). The large difference in presentations is likely to further distinguish the two outcomes. Collectively, these experimental design decisions meant that any observed differences in central amygdala GABA neuron responding are unlikely to reflect valence, but likely to reflect one or more of the above features.

      We appreciate the reviewers’ comments regarding the experimental design. When assessing fear versus reward, we chose stimuli that elicit known behavioral responses, freezing versus consumption. The use of stimuli of the same modality is unlikely to elicit easily definable fear or reward responses or to be precisely matched for sensory intensity. For example, sweet or bitter tastes can be used, but even these activate different taste receptors and vary in the duration of the activation of taste-specific signaling (e.g. how long the taste lingers in the mouth). The approach we employed is similar to that of Yang et al., 2023 (doi: 10.1038/s41586-023-05910-2) that used water reward and shock to characterize the response profiles of somatostatin neurons of the central amygdala. Similar to what was reported by Yang and colleagues we observed that the majority of CeA GABA neurons responded selectively to one unconditioned stimulus (~52%). We observed that 15% of neurons responded in the same direction, either activated or inhibited, by the food or shock US. These were defined as salience based on the definitions of Lin and Nicolelis, 2008 (doi: 10.1016/j.neuron.2008.04.031) in which basal forebrain neurons responded similarly to reward or punishment irrespective of valence. The designation of valence encoding based opposite responses to the food or shock is straightforward (~10% of cells); however, we agree that the designation of modality-specific encoding neurons as valence encoding is less straightforward.

      A second weakness noted by a majority of reviewers was a lack of cue-responsive unit and a lack of exploration of the diversity of response types, and the relationship cue and outcome firing. The lack of large numbers of neurons increasing firing to one or both cues is particularly surprising given the critical contribution of central amygdala GABA neurons to the acquisition of conditioned fear (which the authors measured) as well as to conditioned orienting (which the authors did not measure). Regression-like analyses would be a straightforward means of identifying neurons varying their firing in accordance with these or other behaviors. It was also noted that appetitive behavior was not measured in a rigorous way. Instead of measuring time near hopper, measures of licking would have been better. Further, measures of orienting behaviors such as startle were missing.

      The authors also missed an opportunity for clustering-like analyses which could have been used to reveal neurons uniquely signaling cues, outcomes or combinations of cues and outcomes. If the authors calcium imaging approach is not able to detect expected central amygdala cue responding, might it be missing other critical aspects of responding?

      As stated in the manuscript, we were surprised by the relatively low number of cue responsive cells; however, when using a less stringent statistical method (Figure 5 - Supplement 2), we observed 13% of neurons responded to the food associated cue and 23% responded to the shock associated cue. The differences are therefore likely a reflection of the rigor of the statistical measure to define the responsive units. The number of CS responsive units is less than reported in the CeAl by Ciocchi et al., 2010 (doi: 10.1038/nature09559 ) who observed 30% activated by the CS and 25% inhibited, but is not that dissimilar from the results of Duvarci et al., 2011 (doi: 10.1523/JNEUROSCI.4985-10.2011 ) who observed 11% activated in the CeAl and 25% inhibited by the CS. These numbers are also consistent with previous single cell calcium imaging of cell types in the CeA. For example, Yang et al., 2023 (doi: 10.1038/s41586-023-05910-2) observed that 13% of somatostatin neurons responded to a reward CS and 8% responded to a shock CS. Yu et al., 2017 (doi: 10.1038/s41593-017-0009-9) observed 26.5% of PKCdelta neurons responded to the shock CS. It should also be noted that our analysis was not restricted to the CeAl. Finally, Food learning was assessed in an operant chamber in freely moving mice with reward pellet delivery. Because liquids were not used for the reward US, licking is not a metric that can be used.

      All reviewers point out that the evidence for salience encoding is even more limited than the evidence for valence. Although the specific concern for each reviewer varied, they all centered on an oversimplistic definition of salience. Salience ought to scale with the absolute value and intensity of the stimulus. Salience cannot simply be responding in the same direction. Further, even though the authors observed subsets of central amygdala neurons increasing or decreasing activity to both outcomes - the outcomes can readily be distinguished based on the temporal profile of responding.

      We thank the reviewers for their comments relating to the definition of salience and valence encoding by central amygdala neurons. We have addressed each of the concerns below.

      Additional concerns are raised by each reviewer. Our consensus is that this study sought to answer an important question - whether central amygdala signal salience or valence in cue-outcome learning. However, the experimental design, analyses, and interpretations do not permit a rigorous and definitive answer to that question. Such an answer would require additional experiments whose designs would address the significant concerns described here. Fully addressing the concerns of each reviewer would result in a re-evaluation of the findings. For example, experimental design better revealing valence and salience, and analyses describing diversity of neuronal responding and relationship to behavior would likely make the results Important or even Fundamental.

      We appreciate the reviewers’ comments and have addressed each concern below.

      Reviewer #2 (Public review):

      In this article, Kong and authors sought to determine the encoding properties of central amygdala (CeA) neurons in response to oppositely valenced stimuli and cues predicting those stimuli. The amygdala and its subregional components have historically been understood to be regions that encode associative information, including valence stimuli. The authors performed calcium imaging of GABA-ergic CeA neurons in freely-moving mice conditioned in Pavlovian appetitive and fear paradigms, and showed that CeA neurons are responsive to both appetitive and aversive unconditioned and conditioned stimuli. They used a variant of a previously published 'circular shifting' technique (Harris, 2021), which allowed them to delineate between excited/non-responsive/inhibited neurons. While there is considerable overlap of CeA neurons responding to both unconditioned stimuli (in this case, food and shock, deemed "salience-encoding" neurons), there are considerably fewer CeA neurons that respond to both conditioned stimuli that predict the food and shock. The authors finally demonstrated that there are no differences in the order of Pavlovian paradigms (fear - shock vs. shock - fear), which is an interesting result, and convincingly presented given their counterbalanced experimental design.

      In total, I find the presented study useful in understanding the dynamics of CeA neurons during a Pavlovian learning paradigm. There are many strengths of this study, including the important question and clear presentation, the circular shifting analysis was convincing to me, and the manuscript was well written. We hope the authors will find our comments constructive if they choose to revise their manuscript.

      While the experiments and data are of value, I do not agree with the authors interpretation of their data, and take issue with the way they used the terms "salience" and "valence" (and would encourage them to check out Namburi et al., NPP, 2016) regarding the operational definitions of salience and valence which differ from my reading of the literature. To be fair, a recent study from another group that reports experiments/findings which are very similar to the ones in the present study (Yang et al., 2023, describing valence coding in the CeA using a similar approach) also uses the terms valence and salience in a rather liberal way that I would also have issues with (see below). Either new experiments or revised claims would be needed here, and more balanced discussion on this topic would be nice to see, and I felt that there were some aspects of novelty in this study that could be better highlighted (see below).

      One noteworthy point of alarm is that it seems as if two data panels including heatmaps are duplicated (perhaps that panel G of Figure 5-figure supplement 2 is a cut and paste error? It is duplicated from panel E and does not match the associated histogram).

      We thank the reviewer for their insightful comments and assessment of the manuscript.

      Major concerns:

      (1) The authors wish to make claims about salience and valence. This is my biggest gripe, so I will start here.

      (1a) Valence scales for positive and negative stimuli and as stated in Namburi et al., NPP, 2016 where we operationalize "valence" as having different responses for positive and negative values and no response for stimuli that are not motivational significant (neutral cues that do not predict an outcome). The threshold for claiming salience, which we define as scaling with the absolute value of the stimulus, and not responding to a neutral stimulus (Namburi et al., NPP, 2016; Tye, Neuron, 2018; Li et al., Nature, 2022) would require the lack of response to a neutral cue.

      We appreciate the reviewer’s comment on the definitions of salience and valence and agree that there is not a consistent classification of these response types in the field. As stated above, we used the designation of salience encoding if the cells respond in the same direction to different stimuli regardless of the valence of the stimulus similar to what was described previously (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031). Similar definitions of salience have also been reported elsewhere (for examples see: Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006,  Zhu et al., 2018 doi: 10.1126/science.aat0481, and  Comoli et al., 2003, doi: 10.1038/nn1113P). Per the suggestion of the reviewer, we longitudinally tracked cells on the first day of Pavlovian reward conditioning the fear conditioning day. Although there were considerably fewer head entries on the first day of reward conditioning, we were able to identify 10 cells that were activated by both the food US and shock US. We compared the responses to the first five head entries and last head entries and the first 5 shocks and last five shocks. Consistent with what has been reported for salience encoding neurons in the basal forebrain (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected and decreased in later trials.

      Author response image 1.

      (1b) The other major issue is that the authors choose to make claims about the neural responses to the USs rather than the CSs. However, being shocked and receiving sucrose also would have very different sensorimotor representations, and any differences in responses could be attributed to those confounds rather than valence or salience. They could make claims regarding salience or valence with respect to the differences in the CSs but they should restrict analysis to the period prior to the US delivery.

      Perhaps the reviewer missed this, but analysis of valence and salience encoding to the different CSs are presented in Figure 5G, Figure 5 -Supplement 1 C-D, and Figure 5 -Supplement 2 N-O. Analysis of CS responsiveness to CSFood and CSShock were analyzed during the conditioning sessions Figure 3E-F, Figure 4B-C, Figure 5 – Supplement 2J-O and Figure 5 – Supplement 3K-L, and during recall probe tests for both CSFood and CSShock, Figure 5 – Supplement 1C-J.

      (1c) The third obstacle to using the terms "salience" or "valence" is the lack of scaling, which is perhaps a bigger ask. At minimum either the scaling or the neutral cue would be needed to make claims about valence or salience encoding. Perhaps the authors disagree - that is fine. But they should at least acknowledge that there is literature that would say otherwise.

      (1d) In order to make claims about valence, the authors must take into account the sensory confound of the modality of the US (also mentioned in Namburi et al., 2016). The claim that these CeA neurons are indeed valence-encoding (based on their responses to the unconditioned stimuli) is confounded by the fact that the appetitive US (food) is a gustatory stimulus while the aversive US (shock) is a tactile stimulus.

      We provided the same analysis for the US and CS. The US responses were larger and more prevalent, but similar types of encoding were observed for the CS. We agree that the food reward and the shock are very different sensory modalities. As stated above, the use of stimuli of the same modality is unlikely to elicit easily definable fear or reward responses or to be precisely matched for sensory intensity. We agree that the definition of cells that respond to only one stimulus is difficult to define in terms of valence encoding, as opposed to being specific for the sensory modality and without scaling of the stimulus it is difficult to fully address this issue. It should be noted however, that if the cells in the CeA were exclusively tuned to stimuli of different sensory modalities, we would expect to see a similar number of cells responding to the CS tones (auditory) as respond to the food (taste) and shock (somatosensory) but we do not. Of the cells tracked longitudinally 80% responded to the USs, with 65% of cells responding to food (activated or inhibited) and 44% responding to shock (activated or inhibited).

      (2) Much of the central findings in this manuscript have been previously described in the literature. Yang et al., 2023 for instance shows that the CeA encodes salience (as demonstrated by the scaled responses to the increased value of unconditioned stimuli, Figure 1 j-m), and that learning amplifies responsiveness to unconditioned stimuli (Figure 2). It is nice to see a reproduction of the finding that learning amplifies CeA responses, though one study is in SST::Cre and this one in VGAT::cre - perhaps highlighting this difference could maximize the collective utility for the scientific community?

      We agree that the analysis performed here is similar to what was conducted by Yang et al., 2023. With the major difference being the types of neurons sampled. Yang et al., imaged only somatostatin neurons were as we recorded all GABAergic cell types within the CeA. Moreover, because we imaged from 10 mice, we sampled neurons that ostensibly covered the entire dorsal to ventral extent of the CeA (Figure 1 – Supplement 1). Remarkably, we found that the vast majority of CeA neurons (80%) are responsive to food or shock. Within this 80% there are 8 distinct response profiles consistent with the heterogeneity of cell types within the CeA based on connectivity, electrophysiological properties, and gene expression. Moreover, we did not find any spatial distinction between food or shock responsive cells, with the responsive cell types being intermingled throughout the dorsal to ventral axis (Figure 5 – Supplement 3).

      (3) There is at least one instance of copy-paste error in the figures that raised alarm. In the supplementary information (Figure 5- figure supplement 2 E;G), the heat maps for food-responsive neurons and shock-responsive neurons are identical. While this almost certainly is a clerical error, the authors would benefit from carefully reviewing each figure to ensure that no data is incorrectly duplicated.

      We thank the reviewer for catching this error. It has been corrected.

      (4) The authors describe experiments to compare shock and reward learning; however, there are temporal differences in what they compare in Figure 5. The authors compare the 10th day of reward learning with the 1st day of fear conditioning, which effectively represent different points of learning and retrieval. At the end of reward conditioning, animals are utilizing a learned association to the cue, which demonstrates retrieval. On the day of fear conditioning, animals are still learning the cue at the beginning of the session, but they are not necessarily retrieving an association to a learned cue. The authors would benefit from recording at a later timepoint (to be consistent with reward learning- 10 days after fear conditioning), to more accurately compare these two timepoints. Or perhaps, it might be easier to just make the comparison between Day 1 of reward learning and Day 1 of fear learning, since they must already have these data.

      We agree that there are temporal differences between the food and shock US deliveries. This is likely a reflection of the fact that the shock delivery is passive and easily resolved based on the time of the US delivery, whereas the food responses are variable because they are dependent upon the consumption of the sucrose pellet. Because of these differences the kinetics of the responses cannot be accurately compared. This is why we restricted our analysis to whether the cells were food or shock responsive. Aside from reporting the temporal differences in the signals did not draw major conclusions about the differences in kinetics. In our experimental design we counterbalanced the animals that received fear conditioning firs then food conditioning, or food conditioning then fear conditioning to ensure that order effects did not influence the outcome of the study. It is widely known that Pavlovian fear conditioning can facilitate the acquisition of conditioned stimulus responses with just a single day of conditioning. In contrast, Pavlovian reward conditioning generally progresses more slowly. Because of this we restricted our analysis to the last day of reward conditioning to the first and only day of fear conditioning. However, as stated above, we compared the responses of neurons defined as salience during day 1 of reward conditioning and fear conditioning. As would be predicted based on previous definitions of salience encoding (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected

      (5) The authors make a claim of valence encoding in their title and throughout the paper, which is not possible to make given their experimental design. However, they would greatly benefit from actually using a decoder to demonstrate their encoding claim (decoding performance for shock-food versus shuffled labels) and simply make claims about decoding food-predictive cues and shock-predictive cues. Interestingly, it seems like relatively few CeA neurons actually show differential responses to the food and shock CSs, and that is interesting in itself.

      As stated above, valence and salience encoding were defined similar to what has been previously reported (Li et al., 2019, doi: 10.7554/eLife.41223; Yang et al., 2023, doi: 10.1038/s41586-023-05910-2; Huang et al., 2024, doi: 10.1038/s41586-024-07819; Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). Interestingly, many of these studies did not vary the US intensity.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript entitled Kong and colleagues investigate the role of distinct populations of neurons in the central amygdala (CeA) in encoding valence and salience during both appetitive and aversive conditioning. The study expands on the work of Yang et al. (2023), which specifically focused on somatostatin (SST) neurons of the CeA. Thus, this study broadens the scope to other neuronal subtypes, demonstrating that CeA neurons in general are predominantly tuned to valence representations rather than salience.

      We thank the reviewer for their insightful comments and assessment of the manuscript.

      Strengths:

      One of the key strengths of the study is its rigorous quantitative approach based on the "circular-shift method", which carefully assesses correlations between neural activity and behavior-related variables. The authors' findings that neuronal responses to the unconditioned stimulus (US) change with learning are consistent with previous studies (Yang et al., 2023). They also show that the encoding of positive and negative valence is not influenced by prior training order, indicating that prior experience does not affect how these neurons process valence.

      Weaknesses:

      However, there are limitations to the analysis, including the lack of population-based analyses, such as clustering approaches. The authors do not employ hierarchical clustering or other methods to extract meaning from the diversity of neuronal responses they recorded. Clustering-based approaches could provide deeper insights into how different subpopulations of neurons contribute to emotional processing. Without these methods, the study may miss patterns of functional specialization within the neuronal populations that could be crucial for understanding how valence and salience are encoded at the population level.

      We appreciate the reviewer’s comments regarding clustering-based approaches. In order to classify cells as responsive to the US or CS we chose to develop a statistically rigorous method for classifying cell response types. Using this approach, we were able to define cell responses to the US and CS. Importantly, we identified 8 distinct response types to the USs. It is not clear how additional clustering analysis would improve cell classifications.

      Furthermore, while salience encoding is inferred based on responses to stimuli of opposite valence, the study does not test whether these neuronal responses scale with stimulus intensity-a hallmark of classical salience encoding. This limits the conclusions that can be drawn about salience encoding specifically.

      As stated above, we used salience classifications similar to those previously described (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). We agree that varying the stimulus intensity would provide a more rigorous assessment of salience encoding; however, several of the studies mentioned above classify cells as salience encoding without varying stimulus intensity. Additionally, the inclusion of recordings with varying US intensities on top of the Pavlovian reward and fear conditioning would further decrease the number of cells that can be longitudinally tracked and would likely decrease the number of cells that could be classified.

      In sum, while the study makes valuable contributions to our understanding of CeA function, the lack of clustering-based population analyses and the absence of intensity scaling in the assessment of salience encoding are notable limitations.

      Reviewer #4 (Public review):

      Summary:

      The authors have performed endoscopic calcium recordings of individual CeA neuron responses to food and shock, as well as to cues predicting food and shock. They claim that a majority of neurons encode valence, with a substantial minority encoding salience.

      Strengths:

      The use of endoscopic imaging is valuable, as it provides the ability to resolve signals from single cells, while also being able to track these cells across time. The recordings appear well-executed, and employ a sophisticated circular shifting analysis to avoid statistical errors caused by correlations between neighboring image pixels.

      Weaknesses:

      My main critique is that the authors didn't fully test whether neurons encode valence. While it is true that they found CeA neurons responding to stimuli that have positive or negative value, this by itself doesn't indicate that valence is the primary driver of neural activity. For example, they report that a majority of CeA neurons respond selectively to either the positive or negative US, and that this is evidence for "type I" valence encoding. However, it could also be the case that these neurons simply discriminate between motivationally relevant stimuli in a manner unrelated to valence per se. A simple test of this would be to check if neural responses generalize across more than one type of appetitive or aversive stimulus, but this was not done. The closest the authors came was to note that a small number of neurons respond to CS cues, of which some respond to the corresponding US in the same direction. This is relegated to the supplemental figures (3 and 4), and it is not noted whether the the same-direction CS-US neurons are also valence-encoding with respect to different USs. For example, are the neurons excited by CS-food and US-food also inhibited by shock? If so, that would go a long way toward classifying at least a few neurons as truly encoding valence in a generalizable way.

      As stated above, valence and salience encoding were defined similar to what has been previously reported (Li et al., 2019, doi: 10.7554/eLife.41223; Yang et al., 2023, doi: 10.1038/s41586-023-05910-2; Huang et al., 2024, doi: 10.1038/s41586-024-07819; Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). As reported in Figure 5 and Figure 5 – Supplement 3, ~29% of CeA neurons responded to both food and shock USs (15% in the same direction and 13.5% in the opposite direction). In contrast, only 6 of 303 cells responded to both the CSfood and CSshock, all in the same direction.

      A second and related critique is that, although the authors correctly point out that definitions of salience and valence are sometimes confused in the existing literature, they then go on themselves to use the terms very loosely. For example, the authors define these terms in such a way that every neuron that responds to at least one stimulus is either salience or valence-encoding. This seems far too broad, as it makes essentially unfalsifiable their assertion that the CeA encodes some mixture of salience and valence. I already noted above that simply having different responses to food and shock does not qualify as valence-encoding. It also seems to me that having same-direction responses to these two stimuli similarly does not quality a neuron as encoding salience. Many authors define salience as being related to the ability of a stimulus to attract attention (which is itself a complex topic). However, the current paper does not acknowledge whether they are using this, or any other definition of salience, nor is this explicitly tested, e.g. by comparing neural response magnitudes to any measure of attention.

      As stated in response to reviewer 2, we longitudinally tracked cells on the first day of Pavlovian reward conditioning the fear conditioning day. Although there were considerably fewer head entries on the first day of reward conditioning, we were able to identify 10 cells that were activated by both the food US and shock US. We compared the responses to the first five head entries and last head entries and the first 5 shocks and last five shocks. Consistent with what has been reported for salience encoding neurons in the basal forebrain (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected and decreased in later trials.

      The impression I get from the authors' data is that CeA neurons respond to motivationally relevant stimuli, but in a way that is possibly more complex than what the authors currently imply. At the same time, they appear to have collected a large and high-quality dataset that could profitably be made available for additional analyses by themselves and/or others.

      Lastly, the use of 10 daily sessions of training with 20 trials each seems rather low to me. In our hands, Pavlovian training in mice requires considerably more trials in order to effectively elicit responses to the CS. I wonder if the relatively sparse training might explain the relative lack of CS responses?

      It is possible that learning would have occurred more quickly if we had used greater than 20 trials per session. However, we routinely used 20-25 trials for Pavlovian reward conditioning (doi: 10.1073/pnas.1007827107; doi: 10.1523/JNEUROSCI.5532-12.2013; doi: 10.1016/j.neuron.2013.07.044; and doi: 10.1016/j.neuron.2019.11.024).

    1. Author Response

      Response to Reviewer 1:

      Summary of what the author was trying to achieve: In this study, the author aimed to develop a method for estimating neuronal-type connectivity from transcriptomic gene expression data, specifically from mouse retinal neurons. They sought to develop an interpretable model that could be used to characterize the underlying genetic mechanisms of circuit assembly and connectivity.

      Strengths: The proposed bilinear model draws inspiration from commonly implemented recommendation systems in the field of machine learning. The author presents the model clearly and addresses critical statistical limitations that may weaken the validity of the model such as multicollinearity and outliers. The author presents two formulations of the model for separate scenarios in which varying levels of data resolution are available. The author effectively references key work in the field when establishing assumptions that affect the underlying model and subsequent results. For example, correspondence between gene expression cell types and connectivity cell types from different references are clearly outlined in Tables 1-3. The model training and validation are sufficient and yield a relatively high correlation with the ground truth connectivity matrix. Seemingly valid biological assumptions are made throughout, however, some assumptions may reduce resolution (such as averaging over cell types), thus missing potentially important single-cell gene expression interactions.

      Thank you for acknowledging the strengths of this work. The assumption to average gene expression data across individual cells within a given cell type was made in response to the inherent limitations of, for example, the mouse retina dataset, where individual cell-level connectivity and gene expression data are not profiled jointly (the second scenario in our paper). This approach was a necessary compromise to facilitate the analysis at the cell type level. However, in datasets where individual cell-level connectivity and gene expression data are matched, such as the C.elegans dataset referenced below, our model can be applied to achieve single-cell resolution (the first scenario in our paper), offering a more detailed understanding of genetic underpinnings in neuronal connectivity.

      Weaknesses: The main results of the study could benefit from replication in another dataset beyond mouse retinal neurons, to validate the proposed method. Dimensionality reduction significantly reduces the resolution of the model and the PCA methodology employed is largely non-deterministic. This may reduce the resolution and reproducibility of the model. It may be worth exploring how the PCA methodology of the model may affect results when replicating. Figure 5, ’Gene signatures associated with the two latent dimensions’, lacks some readability and related results could be outlined more clearly in the results section. There should be more discussion on weaknesses of the results e.g. quantification of what connectivity motifs were not captured and what gene signatures might have been missed.

      I value the suggestion of validating the propose method in another dataset. In response, I found the C.elegans dataset in the references the reviewer suggested below a good candidate for this purpose, and I plan to explore this dataset and incorporate findings in the revised manuscript. I understand the concerns regarding the PCA methodology and its potential impact on the model’s resolution and reproducibility. In response, alternative methods, such as regularization techniques, will be explored to address these issues. Additionally, I agree that enhancing the clarity and readability of Figure 5, as well as including a more comprehensive discussion of the model’s limitations, would significantly strengthen the manuscript.

      The main weakness is the lack of comparison against other similar methods, e.g. methods presented in Barabási, Dániel L., and Albert-László Barabási. "A genetic model of the connectome." Neuron 105.3 (2020): 435-445. Kovács, István A., Dániel L. Barabási, and Albert-László Barabási. "Uncovering the genetic blueprint of the C. elegans nervous system." Proceedings of the National Academy of Sciences 117.52 (2020): 33570-33577. Taylor, Seth R., et al. "Molecular topography of an entire nervous system." Cell 184.16 (2021): 4329-4347.

      Thank you for highlighting the importance of comparing our model with others, particularly those mentioned in your comments. After reviewing these papers, I find that our bilinear model aligns closely with the methods described, especially in [1, 2]. To see this, let’s start with Equation 1 in Kovács et al. [2]:

      In this equation, B represents the connectivity matrix, while X denotes the gene expression patterns of individual neurons in C.elegans. The operator O is the genetic rule operator governing synapse formation, linking connectivity with individual neuronal expression patterns. It’s noteworthy that the work of Barabási and Barabási [1] explores a specific application of this framework, focusing on O for B that represents biclique motifs in the C.elegans neural network.

      To identify the the operator O, the authors sought to minimize the squared residual error:

      with regularization on O.

      Adopting the notation from our bilinear model paper and using Z to represent the connectivity matrix, the above becomes

      Coming back to the bilinear model formulation, the optimization problem, as formulated for the C.elegans dataset where individual neuron connectivity and gene expression are accessible, takes the form:

      where we consider each neuron as a distinct neuronal type. In addition, we extend the dimensions of X and Y to encompass the entire set of neurons in C.elegans, with X = Y ∈ Rn×p, where n signifies the total number of neurons and p the number of genes. Accordingly, our optimization challenge evolves into:

      Upon comparison with the earlier stated equation, it becomes clear that our approach aligns consistently with the notion of O = ABT. This effectively results in a decomposition of the genetic rule operator O. This decomposition extends beyond mere mathematical convenience, offering several substantial benefits reminiscent of those seen in the collaborative filtering of recommendation systems:

      • Computational Efficiency: The primary advantage of this approach is its improvement in computational efficiency. For instance, solving for O ∈ Rp×p necessitates determining p2 entries. In contrast, solving for A ∈ Rp×d and B ∈ Rp×d involves determining only 2pd entries, where p is the number of genes, and d is the number of latent dimensions. Assuming the existence of a lower-dimensional latent space (d << p) that captures the essential variability in connectivity, resolving A and B becomes markedly more efficient than resolving O. Additionally, from a computational system design perspective, inferring the connectivity of a neuron allows for caching the latent embeddings of presynaptic neurons XA or postsynaptic neurons XB with a space complexity of O(nd). This is significantly more space-efficient than caching XO or OXT, which has a space complexity of O(np). This difference is particularly notable when dealing with large numbers of neurons, such as those in the entire mouse brain. The bilinear modeling approach thus enables effective handling of large datasets, simplifying the optimization problem and reducing computational load, thereby making the model more scalable and faster to execute.

      • Interpretability: The separation into A for presynaptic features and B for postsynaptic features provides a clearer understanding of the distinct roles of pre- and post- synaptic neurons in forming the connection. By projecting the pre- and post- synaptic neurons into a shared latent space through XA and YB, one can identify meaningful representations within each axis, as exemplified in different motifs from the mouse retina dataset. The linear characteristics of A and B facilitate direct evaluation of each gene’s contribution to a latent dimension. This interpretability, offering insights into the genetic factors influencing synaptic connections, is beyond what O could provide itself.

      • Flexibility and Adaptability: The bilinear model’s adaptability is another strength. Much like collaborative filtering, which can manage very different user and item features, our bilinear model can be tailored to synaptic partners with genetic data from varied sources. A potential application of this model is in deciphering the genetic correlates of long-range projectomic rules, where pre- and post-synaptic neurons are processed and sequenced separately, or even involving post-synaptic targets being brain regions with genetic information acquired through bulk sequencing. This level of flexibility also allows for model adjustments or extensions to incorporate other biological factors, such as proteomics, thereby broadening its utility across various research inquiries into the determinants of neuronal connectivity.

      In the study by Taylor et al. [3], the authors introduced a generalization of differential gene expressions (DGE) analysis called network DGE (nDGE) to identify genetic determinants of synaptic connections. It focuses on genes co-expressed across pairs of neurons connected, compared with pairs without connection.

      As the authors acknowledged in the method part of the paper, nDGE can only examine single genes co-expressed at synaptic terminals: "While the nDGE technique introduced here is a generalization of standard DGE, interrogating the contribution of pairs of genes in the formation and maintenance of synapses between pairs of neurons, nDGE can only account for a single co-expressed gene in either of the two synaptic terminals (pre/post)."

      In contrast, the bilinear model offers a more comprehensive analysis by seeking a linear combination of gene expressions in both pre- and post-synaptic neurons. This model goes beyond the scope of examining individual co-expressed genes, as it incorporates different weights for the gene expressions of pre- and post-synaptic neurons. This feature of the bilinear model enables it to capture not only homogeneous but also complex and heterogeneous genetic interactions that are pivotal in synaptic connectivity. This highlights the bilinear model’s capability to delve into the intricate interactions of synaptic gene expression.

      Appraisal of whether the author achieved their aims, and whether results support their conclusions: The author achieved their aims by recapitulating key connectivity motifs from single-cell gene expression data in the mouse retina. Furthermore, the model setup allowed for insight into gene signatures and interactions, however could have benefited from a deeper evaluation of the accuracy of these signatures. The author claims the method sets a new benchmark for single-cell transcriptomic analysis of synaptic connections. This should be more rigorously proven. (I’m not sure I can speak on the novelty of the method)

      I value your appraisal. In response, additional validation of the bilinear model on a second dataset will be undertaken.

      Discussion of the likely impact of the work on the field, and the utility of methods and data to the community : This study provides an understandable bilinear model for decoding the genetic programming of neuronal type connectivity. The proposed model leaves the door open for further testing and comparison with alternative linear and/or non-linear models, such as neural networkbased models. In addition to more complex models, this model can be built on to include higher resolution data such as more gene expression dimensions, different types of connectivity measures, and additional omics data.

      Thank you for your positive assessment of the potential impact of the study.

      Response to Reviewer 2:

      Summary: In this study, Mu Qiao employs a bilinear modeling approach, commonly utilized in recommendation systems, to explore the intricate neural connections between different pre- and post-synaptic neuronal types. This approach involves projecting single-cell transcriptomic datasets of pre- and post-synaptic neuronal types into a latent space through transformation matrices. Subsequently, the cross-correlation between these projected latent spaces is employed to estimate neuronal connectivity. To facilitate the model training, connectomic data is used to estimate the ground-truth connectivity map. This work introduces a promising model for the exploration of neuronal connectivity and its associated molecular determinants. However, it is important to note that the current model has only been tested with Bipolar Cell and Retinal Ganglion Cell data, and its applicability in more general neuronal connectivity scenarios remains to be demonstrated.

      Strengths: This study introduces a succinct yet promising computational model for investigating connections between neuronal types. The model, while straightforward, effectively integrates singlecell transcriptomic and connectomic data to produce a reasonably accurate connectivity map, particularly within the context of retinal connectivity. Furthermore, it successfully recapitulates connectivity patterns and helps uncover the genetic factors that underlie these connections.

      Thank you for your positive assessment of the paper.

      Weaknesses:

      1. The study lacks experimental validation of the model’s prediction results.

      Thank you for pointing out the importance of experimental validation. I acknowledge that the current version of the study is focused on the development and validation of the computational model, using the datasets presently available to us. Moving forward, I plan to collaborate with experimental neurobiologists. These collaborations are aimed at validating our model’s predictions, including the delta-protocadherins mentioned in the paper. However, considering the extensive time and resources required for conducting and interpreting experimental results, I believe it is more pragmatic to present a comprehensive experimental study, including the design and execution of experiments informed by the model’s predictions, in a separate follow-up paper. I intend to include a paragraph in the discussion of this paper outlining the future direction for experimental validation.

      1. The model’s applicability in other neuronal connectivity settings has not been thoroughly explored.

      I recognize the importance of assessing the model across different neuronal systems. In response to similar feedback from Reviewer 1, I am keen to extend the study to include the C.elegans dataset mentioned earlier. The results from applying our bilinear model to the second dataset will be incorporated into the revised manuscript.

      1. The proposed method relies on the availability of neuronal connectomic data for model training, which may be limited or absent in certain brain connectivity settings.

      The concern regarding the dependency of our model on the availability of connectomic data is valid. While complete connectomes are available for organisms like C.elegans and Drosophila, and efforts are underway to map the connectome of the entire mouse brain, such data may not always be accessible for all research contexts. Recognizing this limitation, part of the ongoing research is to explore ways to adapt our model to the available data, such as projectomic data. Furthermore, our bilinear model is compatible with trans-synaptic virus-based sequencing techniques [4, 5], allowing us to leverage data from these experimental approaches to uncover the genetic underpinnings of neuronal connectivity. These initiatives are crucial steps towards broadening the applicability of our model, ensuring its relevance and usefulness in diverse brain connectivity studies where detailed connectomic data may not be readily available.

      References

      [1] Dániel L. Barabási and Albert-László Barabási. A genetic model of the connectome. Neuron, 105(3):435–445, 2020.

      [2] István A. Kovács, Dániel L. Barabási, and Albert-László Barabási. Uncovering the genetic blueprint of the c. elegans nervous system. Proceedings of the National Academy of Sciences, 117(52):33570–33577, 2020.

      [3] Seth R. Taylor, Gabriel Santpere, Alexis Weinreb, Alec Barrett, Molly B. Reilly, Chuan Xu, Erdem Varol, Panos Oikonomou, Lori Glenwinkel, Rebecca McWhirter, Abigail Poff, Manasa Basavaraju, Ibnul Rafi, Eviatar Yemini, Steven J. Cook, Alexander Abrams, Berta Vidal, Cyril Cros, Saeed Tavazoie, Nenad Sestan, Marc Hammarlund, Oliver Hobert, and David M. 3rd Miller. Molecular topography of an entire nervous system. Cell, 184(16):4329–4347, 2021.

      [4] Nicole Y. Tsai, Fei Wang, Kenichi Toma, Chen Yin, Jun Takatoh, Emily L. Pai, Kongyan Wu, Angela C. Matcham, Luping Yin, Eric J. Dang, Denise K. Marciano, John L. Rubenstein, Fan Wang, Erik M. Ullian, and Xin Duan. Trans-seq maps a selective mammalian retinotectal synapse instructed by nephronectin. Nat Neurosci, 25(5):659–674, May 2022.

      [5] Aixin Zhang, Lei Jin, Shenqin Yao, Makoto Matsuyama, Cindy van Velthoven, Heather Sullivan, Na Sun, Manolis Kellis, Bosiljka Tasic, Ian R. Wickersham, and Xiaoyin Chen. Rabies virusbased barcoded neuroanatomy resolved by single-cell rna and in situ sequencing. bioRxiv, 2023.

    1. Author response:

      Reviewer #1:

      The only minor weakness that I found is the assumption of independence of bacterial species, which is expressed as the well-stirred approximation. One could imagine that bacterial species might cooperate, leading to non-uniform distributions that are real. How to distinguish such situations? I believe that this method can be extended to determine if this is the case or not before the application. For example, if the bacteria species are independent of each other and one can use the binomial distributions, then the Fano factor would be proportional to the overall relative fraction of bacterial species. Maybe a simple test can be added to test it before the application of REPOP. However, I believe that this is a minor issue.

      This is an interesting point raised by the reviewer.

      First, we need to clarify an important point–we do not make a well-stirred assumption. Samples can be drawn and plated from any region of space however small and that region’s population can be quantified using our method. The stirring only occurs after we collect a sample in order to dilute the contents and pour the solution homogeneously over the plate.

      As such, learning multiple independent species is possible and not impacted by the dilution (“wellstirred” assumption). In the revised manuscript we will make it clear that this assumption concerns the dilution process. Any correlation between species arises in the initial sample and should be retained in the plating. Once given the sample, the dilution itself produces independent binomial draws from that point in space from which cultures were harvested. REPOP is designed to recover the true underlying heterogeneity in species abundance (even from limited data) by leveraging a Bayesian framework that remains valid regardless of whether species are independent or correlated.

      If one applies the method for multiple species as is, REPOP can recover the marginal distribution of each species in each plate if they are selectively cultured or many species at once if the colonies are sufficiently distinct. To demonstrate this, we will add a synthetic example with two species whose populations in a sample are correlated to the manuscript.

      However, in order to learn the joint distribution and capture correlations between species within samples, the method would need to be extended. At present, in Eq. 5 we sum the likelihood over all values of n, using a data-driven cutoff (twice the na¨ıvely estimated count times the dilution factor). Extending this to multiple species adding up to (n1,n2), while retain the generality of the method, would require quadratically scaling memory with this cutoff in the population number. For this reason while we will comment on this in the next version of the manuscript, it will not be implemented as part of REPOP.

      Reviewer #2:

      A more thorough discussion of when and by how much estimated microbial population abundance distributions differ from the ground truth would be helpful in determining the best practices for applying this method. Not only would this allow researchers to understand the sampling effort necessary to achieve the results presented here, but it would also contextualize the experimental results presented in the paper. Particularly, there is a disconnect between the discussion of the large sample sizes necessary to achieve accurate multimodal distribution estimates and the small sample sizes used in both experiments.

      That is a great suggestion from the reviewer. To address it, we will expand Appendix B, which currently presents the relative error between the means for the experimental results in Fig. 3, to also include a comparable evaluation for the synthetic data example in Fig. 2.

      Specifically, for each example, we will report (1) the relative error in the estimated means (as already done for Fig. 3), and (2) the Kullback-Leibler (KL) divergence between the reconstructed and ground truth distributions. These metrics will be shown as a function of the size of the dataset, enabling a direct assessment of how the sampling effort affects the precision of the inference.

      That said, we highlight that by explicitly modeling the dilution process within a Bayesian framework, REPOP extracts the mathematically optimal amount of information from each individual sample no matter the sample size. Our strategy therefore leads to better inference with fewer measurements, which is particularly important in applications such as plate counting, where data acquisition is laborintensive.

      Reviewer #3:

      While the study is promising, there are a few areas where the paper could be strengthened to increase its impact and usability. First, the extent to which dilution and plating introduce noise is not fully explored. Could this noise significantly affect experimental conclusions? And under what conditions does it matter most? Does it depend on experimental design or specific parameter values? Clarifying this would help readers appreciate when and why REPOP should be used.

      We agree with the reviewer that this is an important point, and we will expand Appendix B to include a quantitative analysis using simulated data (Fig. 2), reporting both relative error and KL divergence as a function of dataset size. This complements our response to Reviewer #2 clarifying when REPOP offers the greatest benefit.

      In addition, we will expand the discussion on how modeling dilution noise becomes essential when learning population dynamics. In particular, we will emphasize the role of Model 3, especially relevant when working with multiple plates and approaching the asymptotic regime—an aspect that was alluded to in Fig. 3 but not fully explored.

      Second, more practical details about the tool itself would be very helpful. Simply stating that it is available on GitHub may not be enough. Readers will want to know what programming language it uses, what the input data should look like, and ideally, see a step-by-step diagram of the workflow. Packaging the tool as an easy-to-use resource, perhaps even submitting it to CRAN or including example scripts, would go a long way, especially since microbiologists tend to favor user-friendly, recipe-like solutions.

      We will update the introduction to reinforce that REPOP is written in Python(PyTorch), installable via pip, and designed for ease of use. We are also expanding the tutorials to include clearer guidance on data formatting and common workflows. Author response image 1 will be added in the revised manuscript to better illustrate the full application process.

      Author response image 1.

      Third, it would be great to see the method tested on existing datasets, such as those from Nic Vega and Jeff Gore (2017), which explore how colonization frequency impacts abundance fluctuation distributions. Even if the general conclusions remain unchanged, showing that REPOP can better match observed patterns would strengthen the paper’s real-world relevance.

      That is a great suggestion from the reviewer. We will demonstrate the application of REPOP to datasets such as that of Vega and Gore (Ref. 27 in the manuscript), as well as other publicly available datasets, in the revised version.

      Lastly, it would be helpful for the authors to briefly discuss the limitations of their method, as no approach is without its constraints. Acknowledging these would provide a more balanced and transparent perspective.

      We agree with the reviewer on that. A new subsection will explicitly address the assumptions of our method, and therefore its limitations, including assumptions about species classification, computational cost of joint inference, and dependence on accurate dilution modeling. This discussion will synthesize points raised throughout our response to all reviewers.

    1. Author response:

      Reviewer #1 (Public review):

      Strengths:

      The genetic approaches here for visualizing the recombination status of an endogenous allele are very clever, and by comparing the turnover of wildtype and mutant cells in the same animal the authors can make very convincing arguments about the effect of chronic loss of pu.1. Likely this phenotype would be either very subtle or nonexistent without the point of comparison and competition with the wildtype cells.

      Using multiple species allows for more generalizable results, and shows conservation of the phenomena at play.

      The demonstration of changes to proliferation and cell death in concert with higher expression of tp53 is compelling evidence for the authors' argument.

      Weaknesses:

      This paper is very strong. It would benefit from further investigating the specific relationship between pu.1 and tp53 specifically. Does pu.1 interact with the tp53 locus? Specific molecular analysis of this interaction would strengthen the mechanistic findings.

      We agree with the reviewer’s assessment regarding the significance of the relationship between PU.1 and TP53. A previous study by Tschan et al(1) has shown that PU.1 attenuates the transcriptional activity of the p53 tumor suppressor family through direct binding to the DNA-binding and/or the oligomerization domains of p53/p73 proteins. We will discuss this point in the revised manuscript and cite this paper accordingly. Moreover, to further investigate the interaction between Pu.1 and Tp53 in zebrafish, we intend to perform a comprehensive analysis of the tp53 promoter region utilizing bioinformatic prediction tools. This approach aims to identify potential Pu.1 binding sites, thereby providing insights into the direct regulatory interactions between Pu.1 and the tp53 promoter in zebrafish. 

      Reviewer #2 (Public review):

      Strengths:

      Generation of an elegantly designed conditional pu.1 allele in zebrafish that allows for the visual detection of expression of the knockout allele.

      The combination of analysis of pu.1 function in two model systems, zebrafish and mouse, strengthens the conclusions of the paper.

      Confirmation of the functional significance of the observed upregulation of tp53 in mutant microglia through double mutant analysis provides some mechanistic insight.

      Weaknesses:

      (1) The presented RNA-Seq analysis of mutant microglia is underpowered and details on how the data was analyzed are missing. Only 9-15 cells were analyzed in total (3 pools of 3-5 cells each). Further, the variability in relative gene expression of ccl35b.1, which was used as a quality control and inclusion criterion to define pools consisting of microglia, is extremely high (between ~4 and ~1600, Figure S7A).

      In the revised manuscript, we will elaborate on the methodological details of the RNA analysis. Owing to the technical challenge of unambiguously distinguishing microglia from dendritic cells (DCs) in brain cell suspensions, we employed a strategy of isolating 3-5 cells per pool and quantifying the relative expression of the microglia-specific marker ccl34b.1 normalized to the DC-specific marker ccl19a.1. This approach aimed to reduce DC contamination in downstream analyses. Across all experimental groups subjected to RNA-seq analysis, the ccl34b.1/ccl19a.1 expression ratios exceeded 5, confirming microglia as the dominant cell population. Nonetheless, residual DC contamination in the RNA-seq data cannot be entirely ruled out. We will explicitly acknowledge this technical constraint in the revised manuscript to ensure methodological transparency.

      (2) The authors conclude that the reduction of microglia observed in the adult brain after cKO of pu.1 in the spi-b mutant background is due to apoptosis (Lines 213-215). However, they only provide evidence of apoptosis in 3-5 dpf embryos, a stage at which loss of pu.1 alone does lead to a complete loss of microglia (Figure 2E). A control of pu.1 KI/d839 mutants treated with 4OHT should be added to show that this effect is indeed dependent on the loss of spi-b. In addition, experiments should be performed to show apoptosis in the adult brain after cKO of pu.1 in spi-b mutants as there seems to be a difference in the requirement of pu.1 in embryonic and adult stages.

      We apologize for the omission of data regarding conditional pu.1 knockout alone in the embryos in our manuscript which may have led to ambiguity. We would like to clarify that conditional pu.1 knockout alone at the embryonic stage does not induce microglial death (Author response image 1). Microglial death occurs only when Pu.1 is disrupted in the spi-b mutant background, in both embryonic and adult brains. The blebbing morphology of some microglia after pu.1 conditional knock out in adult spi-b mutant indicated microglia undergo apoptosis at both embryonic (Figure S4) and adult stages Author response image 2). The reviewer’s concern likely arises from the distinct outcomes of global pu.1 knockout (Figure 2) versus conditional pu.1 ablation. Global knockout eliminates microglia during early development due to Pu.1’s essential role in myeloid lineage specification. We plan to include this clarification in the revised manuscript.

      Author response image 1.

      Conditional depletion of Pu.1 in embryonic microglia had no effect for their short-term survival. (A) Schematics of 4-OHT treatment for pu.1<sup>KI/WT</sup> Tg(coro1a:CreER) and pu.1<sup>KI/Δ839</sup> Tg(coro1a:CreER) at embryonic stage. (B) Representative images of DsRed<sup>+</sup> microglia in pu.1<sup>KI/WT</sup> and pu.1<sup>KI/Δ839</sup> at 5 dpf. (C) Quantification of DsRed<sup>+</sup> microglia in pu.1<sup>KI/WT</sup> and pu.1<sup>KI/Δ839</sup> at 3 dpf and 5 dpf. Values represent means ± SD, n.s., P >0.05.

      Author response image 2. Simultaneous inactivation of Pu.1 and Spi-b lead to microglia death in adult zebrafish. (A) The experimental setup for pu.1 conditional knockout in adult spi-b<sup>Δ232/Δ232</sup> mutants (B) the representative images of the midbrain cross section of adult pu.1<sup>KI/+</sup>;spi-b<sup>Δ232/Δ232</sup>;Tg(coro1a:CreER) and pu.1<sup>KI/WT</sup>spi-b<sup>Δ232/Δ232</sup>;Tg(coro1a:CreER) fish at 2 dpi. The white arrow indicates microglia with blebbing morphology.

      (3) The number of microglia after pu.1 knockout in zebrafish did only show a significant decrease 3 months after 4-OHT injection, whereas microglia were almost completely depleted already 7 days after injection in mice. This major difference is not discussed in the paper.

      We propose that zebrafish Pu.1 and Spi-b function cooperatively to regulate microglial maintenance, analogous to the role of PU.1 alone in mice. This cooperative mechanism likely explains the observed difference in microglial depletion kinetics between zebrafish and mice following pu.1 conditional knockout. Specifically, the compensatory activity of Spi-b in zebrafish may buffer the immediate loss of Pu.1, whereas in mice, the absence of SPI-B expression in microglia eliminates this redundancy, resulting in rapid microglial depletion. Furthermore, during evolution, SPI-B appears to have acquired lineagespecific roles, becoming absent in microglia. We will expand on this evolutionary divergence and its implications for microglial regulation in the revised manuscript.

      (4) Data is represented as mean +/-.SEM. Instead of SEM, standard deviation should be shown in all graphs to show the variability of the data. This is especially important for all graphs where individual data points are not shown. It should also be stated in the figure legend if SEM or SD is shown

      We plan to represent our data as mean ± SD in the revised manuscript.

      Reference:

      (1) Tschan MP, Reddy VA, Ress A, Arvidsson G, Fey MF, Torbett BE. PU.1 binding to the p53 family of tumor suppressors impairs their transcriptional activity. Oncogene. 2008 May 29;27(24):3489-93.

    1. Author response:

      eLife assessment

      This useful study reports how neuronal activity in the prefrontal cortex maps time intervals during which animals have to wait until reaching a reward and how this mapping is preserved across days. However, the evidence supporting the claims is incomplete as these sequential neuronal patterns do not necessarily represent time but instead may be correlated with stereotypical behavior and restraint from impulsive decision, which would require further controls (e.g. behavioral analysis) to clarify the main message. The study will be of interest to neuroscientists interested in decision making and motor control. 

      We thank the editors and reviewers for the constructive comments. In light of the questions mentioned by the reviewers, we plan to perform additional analyses in our revision, particularly aiming to address issues related to single-cell scalability, and effects of motivation and movement. We believe these additional data will greatly improve the rigor and clarity of our study. We are grateful for the review process of eLife.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper investigates the neural population activity patterns of the medial frontal cortex in rats performing a nose poking timing task using in vivo calcium imaging. The results showed neurons that were active at the beginning and end of the nose poking and neurons that formed sequential patterns of activation that covaried with the timed interval during nose poking on a trial-by-trial basis. The former were not stable across sessions, while the latter tended to remain stable over weeks. The analysis on incorrect trials suggests the shorter non-rewarded intervals were due to errors in the scaling of the sequential pattern of activity. 

      Strengths:

      This study measured stable signals using in vivo calcium imaging during experimental sessions that were separated by many days in animals performing a nose poking timing task. The correlation analysis on the activation profile to separate the cells in the three groups was effective and the functional dissociation between beginning and end, and duration cells was revealing. The analysis on the stability of decoding of both the nose poking state and poking time was very informative. Hence, this study dissected a neural population that formed sequential patterns of activation that encoded timed intervals. 

      We thank the reviewer for the positive comments.

      Weaknesses: 

      It is not clear whether animals had enough simultaneously recorded cells to perform the analyzes of Figures 2-4. In fact, rat 3 had 18 responsive neurons which probably is not enough to get robust neural sequences for the trial-by-trial analysis and the correct and incorrect trial analysis. 

      We thank the reviewer for the comment. We would like to mention that the 18 cells plotted in Supplementary figure 1 were only from the duration cell category. To improve the clarity of our results, we are going to provide information regarding the number of cells from each rat in our revision. In general, we imaged more than 50 cells from each rat. We would also like to point to the data from individual trials in Supplementary figure 1B showing robust sequentiality.

      In addition, the analysis of behavioral errors could be improved. The analysis in Figure 4A could be replaced by a detailed analysis on the speed, and the geometry of neural population trajectories for correct and incorrect trials.

      We thank the reviewer for the suggestions. We are going to conduct the analysis as the reviewer recommended. We agree with the reviewer that better presentation of the neural activity will be helpful for the readers.

      In the case of Figure 4G is not clear why the density of errors formed two clusters instead of having a linear relation with the produce duration. I would be recommendable to compute the scaling factor on neuronal population trajectories and single cell activity or the computation of the center of mass to test the type III errors. 

      We would like to mention that the prediction errors plotted in this graph were calculated from two types of trials. The correct trials tended to show positive time estimation errors while the incorrect trials showed negative time estimation errors. We believe that the polarity switch between these two types suggested a possible use of this neural mechanism to time the action of the rats.

      In addition, we are going to perform the analysis suggested by the reviewer in our revision. We agree that different ways of analyzing the data would provide better characterization of the scaling effect.

      Due to the slow time resolution of calcium imaging, it is difficult to perform robust analysis on ramping activity. Therefore, I recommend downplaying the conclusion that: "Together, our data suggest that sequential activity might be a more relevant coding regime than the ramping activity in representing time under physiological conditions." 

      We agree with the reviewer and we have mentioned this caveat in our original manuscript. We are going to rephrase the sentence as the reviewer suggested during our revision.

      Reviewer #2 (Public Review):

      In this manuscript, Li and collaborators set out to investigate the neuronal mechanisms underlying "subjective time estimation" in rats. For this purpose, they conducted calcium imaging in the prefrontal cortex of water-restricted rats that were required to perform an action (nosepoking) for a short duration to obtain drops of water. The authors provided evidence that animals progressively improved in performing their task. They subsequently analyzed the calcium imaging activity of neurons and identify start, duration, and stop cells associated with the nose poke. Specifically, they focused on duration cells and demonstrated that these cells served as a good proxy for timing on a trial-by-trial basis, scaling their pattern of actvity in accordance with changes in behavioral performance. In summary, as stated in the title, the authors claim to provide mechanistic insights into subjective time estimation in rats, a function they deem important for various cognitive conditions. 

      This study aligns with a wide range of studies in system neuroscience that presume that rodents solve timing tasks through an explicit internal estimation of duration, underpinned by neuronal representations of time. Within this framework, the authors performed complex and challenging experiments, along with advanced data analysis, which undoubtedly merits acknowledgement. However, the question of time perception is a challenging one, and caution should be exercised when applying abstract ideas derived from human cognition to animals. Studying so-called time perception in rats has significant shortcomings because, whether acknowledged or not, rats do not passively estimate time in their heads. They are constantly in motion. Moreover, rats do not perform the task for the sake of estimating time but to obtain their rewards are they water restricted. Their behavior will therefore reflects their motivation and urgency to obtain rewards. Unfortunately, it appears that the authors are not aware of these shortcomings. These alternative processes (motivation, sensorimotor dynamics) that occur during task performance are likely to influence neuronal activity. Consequently, my review will be rather critical. It is not however intended to be dismissive. I acknowledge that the authors may have been influenced by numerous published studies that already draw similar conclusions. Unfortunately, all the data presented in this study can be explained without invoking the concept of time estimation. Therefore, I hope the authors will find my comments constructive and understand that as scientists, we cannot ignore alternative interpretations, even if they conflict with our a priori philosophical stance (e.g., duration can be explicitly estimated by reading neuronal representation of time) and anthropomorphic assumptions (e.g., rats estimate time as humans do). While space is limited in a review, if the authors are interested, they can refer to a lengthy review I recently published on this topic, which demonstrates that my criticism is supported by a wide range of timing experiments across species (Robbe, 2023). In addition to this major conceptual issue that cast doubt on most of the conclusions of the study, there are also several major statistical issues. 

      Main Concerns 

      (1) The authors used a task in which rats must poke for a minimal amount of time (300 ms and then 1500 ms) to be able to obtain a drop of water delivered a few centimeters right below the nosepoke. They claim that their task is a time estimation task. However, they forget that they work with thirsty rats that are eager to get water sooner than later (there is a reason why they start by a short duration!). This task is mainly probing the animals ability to wait (that is impulse control) rather than time estimation per se. Second, the task does not require to estimate precisely time because there appear to be no penalties when the nosepokes are too short or when they exceed. So it will be unclear if the variation in nosepoke reflects motivational changes rather than time estimation changes. The fact that this behavioral task is a poor assay for time estimation and rather reflects impulse control is shown by the tendency of animals to perform nose-pokes that are too short, the very slow improvement in their performance (Figure 1, with most of the mice making short responses), and the huge variability. Not only do the behavioral data not support the claim of the authors in terms of what the animals are actually doing (estimating time), but this also completely annhilates the interpretation of the Ca++ imaging data, which can be explained by motivational factors (changes in neuronal activity occurring while the animals nose poke may reflect a growing sens of urgency to check if water is available). 

      We would like to respond to the reviewer’s comments 1, 2 and 4 together since they all focus on the same issue. We thank the reviewer for the very thoughtful comments and for sharing his detailed reasoning from a recently published review (Robbe, 2023). A lot of the discussion goes beyond the scope of this study and we agree that whether there is an explicit representation of time (an internal clock) in the brain is a difficult question to answer, particularly by using animal behaviors. In fact, even with fully conscious humans and elaborated task design, we think it is still questionable to clearly dissociate the neural substrate of “timing” from “motor”. In the end, it may as well be that as the reviewer cited from Bergson’s article, the experience of time cannot be measured.

      Studying the neural representation of any internal state may suffer from the same ambiguity. With all due respect, however, we would like to limit our response in the scope of our results. According to the reviewer, two alternative interpretations of the task-related sequential activity exist: 1, duration cells may represent fidgeting or orofacial movements and 2, duration cells may represent motivation or motion plan of the rats. To test the first alternative interpretation, we will perform a more comprehensive analysis of the behavior data at all the limbs and visible body parts of the rat during nose poke and analyze its periodicity among different trials, although the orofacial movements may not be visible to us.

      Regarding the second alternative interpretation, we think our data in the original Figure 4G argues against it. In this graph, we plotted the decoding error of time using the duration cells’ activity against the actual duration of the trials. If the sequential activity of durations cells only represents motivation, then the errors should distribute evenly across different trial times, or linearly modulated by trial durations. The unimodal distribution we observed (Figure 4G and see Author response image 1 below for a re-plot without signs) suggests that the scaling factor of the sequential activity represents information related to time. And the fact that this unimodal distribution centered at the time threshold of the task provides strong evidence for the active use of scaling factor for time estimation. In order to further test the relationship to motivation, we will measure the time interval between exiting nose poke to the start of licking water reward as an independent measurement of motivation for each trial. We will analyze and report whether this measurement correlates with the nose poking durations in our data in the revision.

      Author response image 1.

      Furthermore, whether the scaling sequential activity we report represents behavioral timing or true time estimation, the reviewer would agree that these activities correlate with the animal’s nose poking durations, and a previous study has showed that PFC silencing led to disruption of the mouse’s timing behavior (PMID: 24367075). The main surprising finding of the paper is that these duration cells are different from the start and end cells in terms of their coding stability. Thus, future studies dissecting the anatomical microcircuit of these duration cells may provide further clue regarding whether they receive inputs from thirst or reward-related brain regions. This may help partially resolve the “time” vs. “motor” debate the reviewer mentioned.

      (2) A second issue is that the authors seem to assume that rats are perfectly immobile and perform like some kind of robots that would initiate nose pokes, maintain them, and remove them in a very discretized manner. However, in this kind of task, rats are constantly moving from the reward magazine to the nose poke. They also move while nose-poking (either their body or their mouth), and when they come out of the nose poke, they immediately move toward the reward spout. Thus, there is a continuous stream of movements, including fidgeting, that will covary with timing. Numerous studies have shown that sensorimotor dynamics influence neural activity, even in the prefrontal cortex. Therefore, the authors cannot rule out that what the records reflect are movements (and the scaling of movement) rather than underlying processes of time estimation (some kind of timer). Concretely, start cells could represent the ending of the movement going from the water spout to the nosepoke, and end cells could be neurons that initiate (if one can really isolate any initiation, which I doubt) the movement from the nosepoke to the water spout. Duration cells could reflect fidgeting or orofacial movements combined with an increasing urgency to leave the nose pokes.

      (3)The statistics should be rethought for both the behavioral and neuronal data. They should be conducted separately for all the rats, as there is likely interindividual variability in the impulsivity of the animals.

      We thank the reviewer for the comment, yet we are not quite sure what specifically was asked by the reviewer. There is undoubtedly variance among individual animals. One of the core reasons for statistical comparison is to compare the group difference with the variance due to sampling. It appears that the reviewer would like to require we conduct our analysis using each rat individually. We will conduct and report analysis with individual rat in Figure 1C, Figure 2C, G, K, Figure 4F in our revised manuscript.

      (4) The fact that neuronal activity reflects an integration of movement and motivational factors rather than some abstract timing appears to be well compatible with the analysis conducted on the error trials (Figure 4), considering that the sensorimotor and motivational dynamics will rescale with the durations of the nose poke. 

      (5) The authors should mention upfront in the main text (result section) the temporal resolution allowed by their Ca+ probe and discuss whether it is fast enough in regard of behavioral dynamics occurring in the task. 

      We thank the reviewer for the suggestion. We have originally mentioned the caveat of calcium imaging in the interpretation of our results. We will incorporate more texts for this purpose during our revision. In terms of behavioral dynamics (start and end of nose poke in this case), we think calcium imaging could provide sufficient kinetics. However, the more refined dynamics related to the reproducibility of the sequential activity or the precise representation of individual cells on the scaled duration may be benefited from improved time resolution.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Please refer explicitly to the three types of cells in the abstract. 

      We will modify the abstract as suggested during revision.

      (2) Please refer to the work of Betancourt et al., 2023 Cell Reports, where a trial-by-trail analysis on the correlation between neural trajectory dynamics in MPC and timing behavior is reported. In that same paper the stability of neural sequences across task parameters is reported. 

      We will cite and discuss this study in our revised paper.

      (3) Please state the number of studied animals at the beginning of the results section. 

      We will provide this information as requested. The number of animals were also plotted in Figure 1D for each analysis.

      (4) Why do the middle and right panels of Figure 2E show duration cells. 

      Figure 2E was intended to show examples of duration cells’ activity. We included different examples of cells that peak at different points in the scaled duration. We believe these multiple examples would give the readers a straight forward impression of these cells’ activity patterns.

      (5) Which behavioral sessions of Figure 1B were analyzed further. 

      We will label the analyzed sessions in Figure 1B during our revision.

      (6) In Figure 3A-C please increase the time before the beginning of the trial in order to visualize properly the activation patterns of the start cells. 

      We thank the reviewer for the suggestion and will modify the figure accordingly during revision.

      (7) Please state what could be the behavioral and functional effect of the ablation of the cortical tissue on top of mPFC. 

      We thank the reviewer for the question. In our experience, mice with lens implanted in mPFC did not show observable different to mice without surgery regarding the acquisition of the task and the distribution of the nose-poke durations. Although we could not rule out the effect on other cognitive process, the mice appeared to be intact in the scope of our task. We will provide these behavior data during our revision.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      SUFU modulates Sonic hedgehog (SHH) signaling and is frequently mutated in the B-subtype of SHH-driven medulloblastoma. The B-subtype occurs mostly in infants, is often metastatic, and lacks specific treatment. Yabut et al. found that Fgf5 was highly expressed in the B-subtype of SHH-driven medulloblastoma by examining a published microarray expression dataset. They then investigated how Fgf5 functions in the cerebellum of mice that have embryonic Sufu loss of function. This loss was induced using the hGFAP-cre transgene, which is expressed in multiple cell types in the developing cerebellum, including granule neuron precursors (GNPs) derived from the rhombic lip. By measuring the area of Pax6+ cells in the external granule cell layer (EGL) of Sufu-cKO mice at postnatal day 0, they find Pax6+ cells occupy a larger area in the posterior lobe adjacent to the secondary fissure, which is poorly defined. They show that Fgf5 RNA and phosphoErk1/2 immunostaining are also higher in the same disrupted region. Some of the phosphoErk1/2+ cells are proliferative in the Sufu-cKO. Western blot analysis of Gli proteins that modulate SHH signaling found reduced expression and absence of Gli1 activity in the region of cerebellar dysgenesis in Sufu-cKO mice. This suggests the GNP expansion in this region is independent of SHH signaling. Amazingly, intraventricular injection of the FGFR1-2 antagonist AZD4547 from P0-4 and examined histologically at P7 found the treatment restored cytoarchitecture in the cerebella of Sufu-cKO mice. This is further supported by NeuN immunostaining in the internal granule cell layer, which labels mature, non-diving neurons, and KI67 immunostaining, indicating dividing cells, and primarily found in the EGL. The mice were treated beginning at a timepoint when cerebellar cytoarchitecture was shown to be disrupted and it is indistinguishable from control following treatment. Figure 3 presents the most convincing and exciting data in this manuscript.

      Sufu-cKO do not readily develop cerebellar tumors. The authors detected phosphorylated H2AX immunostaining, which labels double-strand breaks, in some cells in the EGL in regions of cerebellar dysgenesis in the Sufu-cKO, as was cleaved Caspase 3, a marker of apoptosis. P53, downstream of the double-strand break pathway, the protein was reduced in Sufu-cKO cerebellum. Genetically removing p53 from the Sufu-cKO cerebellum resulted in cerebellar tumors in 2-month old mice. The Sufu;p53-dKO cerebella at P0 lacked clear foliation, and the secondary fissure, even more so than the Sufu-cKO. Fgf5 RNA and signaling (pERK1/2) were also expressed ectopically.

      The conclusions of the paper are largely supported by the data, but some data analysis need to be clarified and extended.

      (1) The rationale for examining Fgf5 in medulloblastoma is not sufficiently convincing. The authors previously reported that Fgf15 was upregulated in neocortical progenitors of mice with conditional loss of Sufu (PMID: 32737167). In Figure 1, the authors report FGF5 expression is higher in SHH-type medulloblastoma, especially the beta and gamma subtypes mostly found in infants. These data were derived from a genome-wide dataset and are shown without correction for multiple testing, including other Fgfs. Showing the expression of other Fgfs with FDR correction would better substantiate their choice or moving this figure to later in the manuscript as support for their mouse investigations would be more convincing.

      To assess FGF5 (ENSG00000138675) expression in MB tissues, we used Geo2R (Barrett et al., 2013) to analyze published human MB subtype expression arrays from accession no. GSE85217 (Cavalli et al., 2017). GEO2R is an interactive web tool that compares expression levels of genes of interest (GOI) between sample groups in the GEO series using original submitter-supplied processed data tables. We entered the GOI Ensembl ID and organized data sets according to age and MB subgroup or MBSHH subtype classifications. GEO2R results presented gene expression levels as a table ordered by FDR-adjusted (Benjamini & Hochberg) p-values, with significance level cut-off at 0.05, processed by GEO2R’s built-in limma statistical test. Resulting data were subsequently exported into Prism (GraphPad). We generated scatter plots presenting FGF5 expression levels across all MB subgroups (Figure 1A) and MBSHH subtypes (Figure 1D). We performed additional statistical analyses to compare FGF5 expression levels between MB subgroups and MBSHH subtypes and graphed these data as violin plots (Figure 1B, 1C, and 1E). For these analyses, we used one-way ANOVA with Holm-Sidak’s multiple comparisons test, single pooled variance. P value ≤0.05 was considered statistically significant. Graphs display the mean ± standard error of the mean (SEM).

      Author response image 1.

      Comparative expression of FGF ligands, FGF5, FGF10, FGF12, and FGF19, across all MB subgroups. FGF12 expression is not significantly different, while FGF5, FGF10, and FGF19, show distinct upregulation in MBSHH subgroup (MBWNT n=70, MBSHH n=224, MBGR3 n=143, MBGR4 n=326).

      Expression of the 21 known FGF ligands were also analyzed. Many FGFs did not exhibit differential expression levels in MBSHH compared to other MB subgroups, such as with FGF12 in Figure 1. FGF5, FGF10, and FGF19 (the human orthologue of mouse FGF15) all showed specific upregulation in MBSHH compared to other MB subgroups (Author response image 1), supporting our previous observations that FGF15 is a downstream target of SHH signaling (Yabut et al., 2020), as the reviewer pointed out. However, further stratification of MBSHH patient data revealed that only FGF5 specifically showed upregulation in infants with MBSHH (MBSHHb and MBSHHg Author response image 2) indicating a more prominent role for FGF5 in the developing cerebellum and driver of MBSHH tumorigenesis in this dynamic environment.

      Author response image 2.

      Comparative expression of FGF5, FGF10, and FGF19 in different MBSHH subtypes. FGF5 specifically show mRNA relative levels above 6 in 81% of MBSHH infant patient tumors (n=80 MBSHHb and MBSHHg tumors) unlike 35% of MBSHHa  (n=65) or 0% of MBSHHd  (n=75) tumors.

      (2) The Sufu-cKO cerebellum lacks a clear anchor point at the secondary fissure and foliation is disrupted in the central and posterior lobes. It would be helpful for the authors to review Sudarov & Joyner (PMID: 18053187) for nomenclature specific to the developing cerebellum.

      The reviewers are correct that the cerebellar foliation is severely disrupted in central and posterior lobes, as per Sudarov and Joyner (Neural Development 2007). This nomenclature may be referred to describe the regions referred in this manuscript.

      (3) The metrics used to quantify cerebellar perimeter and immunostaining are not sufficiently described. It is unclear whether the individual points in the bar graph represent a single section from independent mice, or multiple sections from the same mice. For example, in Figures 2B-D. This also applies to Figure 3C-D.

      All quantification were performed from 2-3 20 um cerebellar sections of 3-6 independent mice per genotype analyzed. Individual points in the bar graphs represent the average cell number (quantified from 2-3 sections) from each mice. Figure 2B show data points from n=4 mice per genotype. Figure 2C show data from n=3 mice per genotype. Figure 2D show data from n=6 mice per genotype.  Figure 3C-D show data from n=3 mice per genotype.

      (4) The data on Fgf5 RNA expression presented in Figure 2E are not sufficiently convincing. The perimeter and cytoarchitecture of the cerebellum are difficult to see and the higher magnification shown in 2F should be indicated in 2E.

      The lack of foliation in Sufu-cKO cerebellum is clear particularly when visualizing the perimeter via DAPI labeling (Figure 2E). The expression area of FGF5 is also visibly larger, given that all images in Figure 2E are presented in the same scale (scale bars = 500 um). 

      (5) The data presented in Figure 3 are not sufficiently convincing. The number of cells double positive for pErk and KI67 (Figure 3B) are difficult to see and appear to be few, suggesting the quantification may be unreliable.

      We used KI67+ expression to provide a molecular marker of regions to be quantified in both WT and Sufu-cKO sections. Quantification of labeled cells were performed in images obtained by confocal microscopy, enabling imaging of 1-2 um optical slices since Ki67 or pERK expression might not localize within the same cellular compartments. We relied on continuous DAPI nuclear staining to distinguish individual cells in each optical slice and the colocalization of of Ki67 and pERK. All quantification were performed from 2-3 20 um cerebellar sections of 3-6 independent mice per genotype analyzed. Individual points in the bar graphs represent the average cell number (quantified from 2-3 sections) from each mice.

      (6) The data presented in Figure 4F-J would be more convincing with quantification. The Sufu;p53-dKO appears to have a thickened EGL across the entire vermis perimeter, and very little foliation, relative to control and single cKO cerebella. This is a more widespread effect than the more localized foliation disruption in the Sufu-cKO. 

      We agree with the reviewers that quantification of these phenotypes provide a solid measure of the defects. The phenotypes of Sufu:p53-dKO cerebellum are so profound requiring  in-depth characterization that will be the focus of future studies.

      (7) Figure 5 does not convincingly summarize the results. Blue and purple cells in sagittal cartoon are not defined. Which cells express Fgf5 (or other Fgfs) has not been determined. The yellow cells are not defined in relation to the initial cartoon on the left.

      The revised manuscript will address this confusion by clearly labeling the cells and their roles in the schematic diagram.

      Reviewer #2 (Public Review):

      Summary:

      Mutations in SUFU are implicated in SHH medulloblastoma (MB). SUFU modulates Shh signaling in a context-dependent manner, making its role in MB pathology complex and not fully understood. This study reports that elevated FGF5 levels are associated with a specific subtype of SHH MB, particularly in pediatric cases. The authors demonstrate that Sufu deletion in a mouse model leads to abnormal proliferation of granule cell precursors (GCPs) at the secondary fissure (region B), correlating with increased Fgf5 expression. Notably, pharmacological inhibition of FGFR restores normal cerebellar development in Sufu mutant mice.

      Strengths:

      The identification of increased FGF5 in subsets of MB is novel and a key strength of the paper.

      Weaknesses:

      The study appears incomplete despite the potential significance of these findings. The current paper does not fully establish the causal relationship between Fgf5 and abnormal cerebellar development, nor does it clarify its connection to SUFU-related MB. Some conclusions seem overstated, and the central question of whether FGFR inhibition can prevent tumor formation remains untested.

      Reviewer #3 (Public Review):

      Summary:

      The interaction between FGF signaling and SHH-mediated GNP expansion in MB, particularly in the context of Sufu LoF, has just begun to be understood. The manuscript by Yabut et al. establishes a connection between ectopic FGF5 expression and GNP over-expansion in a late-stage embryonic Sufu LoF model. The data provided links region-specific interaction between aberrant FGF5 signaling with the SHH subtype of medulloblastoma. New data from Yabut et al. suggest that ectopic FGF5 expression correlates with GNP expansion near the secondary fissure in Sufu LoF cerebella. Furthermore, pharmacological blockade of FGF signaling inhibits GNP proliferation. Interestingly, the data indicate that the timing of conditional Sufu deletion (E13.5 using the hGFAP-Cre line) results in different outcomes compared to later deletion (using Math1-cre line, Jiwani et al., 2020). This study provides significant insights into the molecular mechanisms driving GNP expansion in SHH subgroup MB, particularly in the context of Sufu LoF. It highlights the potential of targeting FGF5 signaling as a therapeutic strategy. Additionally, the research offers a model for better understanding MB subtypes and developing targeted treatments.

      Strengths:

      One notable strength of this study is the extraction and analysis of ectopic FGF5 expression from a subset of MB patient tumor samples. This translational aspect of the study enhances its relevance to human disease. By correlating findings from mouse models with patient data, the authors strengthen the validity of their conclusions and highlight the potential clinical implications of targeting FGF5 in MB therapy.

      The data convincingly show that FGFR signaling activation drives GNP proliferation in Sufu, conditional knockout models. This finding is supported by robust experimental evidence, including pharmacological blockade of FGF signaling, which effectively inhibits GNP proliferation. The clear demonstration of a functional link between FGFR signaling and GNP expansion underscores the potential of FGFR as a therapeutic target in SHH subgroup medulloblastoma.

      Previous studies have demonstrated the inhibitory effect of FGF2 on tumor cell proliferation in certain MB types, such as the ptc mutant (Fogarty et al., 2006)(Yaguchi et al., 2009). Findings in this manuscript provide additional support suggesting multiple roles for FGF signaling in cerebellar patterning and development.

      Weaknesses:

      In the GEO dataset analysis, where FGF5 expression is extracted, the reporting of the P-value lacks detail on the statistical methods used, such as whether an ANOVA or t-test was employed. Providing comprehensive statistical methodologies is crucial for assessing the rigor and reproducibility of the results. The absence of this information raises concerns about the robustness of the statistical analysis.

      The revised manuscript will include the following detailed explanation of the statistical analyses of the GEO dataset:

      For the analysis of expression values of FGF5 (ENSG00000138675), we obtained these values using Geo2R (Barrett et al., 2013), which directly analyze published human MB subtype expression arrays from accession no. GSE85217 (Cavalli et al., 2017). GEO2R is an interactive web tool that compares expression levels of genes of interest (GOI) between sample groups in the GEO series using original submitter-supplied processed data tables. We simply entered the GOI Ensembl ID and organized data sets according to age and MB subgroup or MBSHH subtype classifications. GEO2R results presented gene expression levels as a table ordered by FDR-adjusted (Benjamini & Hochberg) p-values, with significance level cut-off at 0.05, processed by GEO2R’s built-in limma statistical test. Resulting data were subsequently exported into Prism (GraphPad). We generated scatter plots presenting FGF5 expression levels across all MB subgroups (Figure 1A) and MBSHH subtypes (Figure 1D). We performed additional statistical analyses to compare FGF5 expression levels between MB subgroups and MBSHH subtypes and graphed these data as violin plots (Figure 1B, 1C, and 1E). For these analyses, we used one-way ANOVA with Holm-Sidak’s multiple comparisons test, single pooled variance. P value ≤0.05 was considered statistically significant. Graphs display the mean ± standard error of the mean (SEM). Sample sizes were:

      Author response table 1.

      Another concern is related to the controls used in the study. Cre recombinase induces double-strand DNA breaks within the loxP sites, and the control mice did not carry the Cre transgene (as stated in the Method section), while Sufu-cKO mice did. This discrepancy necessitates an additional control group to evaluate the effects of Cre-induced double-strand breaks on phosphorylated H2AX-DSB signaling. Including this control would strengthen the validity of the findings by ensuring that observed effects are not artifacts of Cre recombinase activity.

      The breeding scheme we used to generate homozygous SUFU conditional mutants will not generate pups carrying only hGFAP-Cre. Thus, we are unable to compare expression of gH2AX expression in littermates that do not carry loxP sites. The reviewer is correct in pointing out the possibility of Cre recombinase activity inducing double-strand breaks on its own. However, it is likely that any hGFAP-Cre induced double-strand breaks does not sufficiently cause the phenotypes we observed in homozygous mutants (Sufu-cKO) mice because the cerebellum of mice carry heterozygous SUFU mutations (hGFAP-Cre;Sufu-fl/+) do not display the profound cerebellar phenotypes observed in Sufu-cKO mice. We cannot rule out, however, any undetectable abnormalities that could be present which may require further analyses.

      Although the use of the hGFAP-Cre line allows genetic access to the late embryonic stage, this also targets multiple celltypes, including both GNPs and cerebellar glial cells. However, the authors focus primarily on GNPs without fully addressing the potential contributions of neuron-glial interaction. This oversight could limit the understanding of the broader cellular context in which FGF signaling influences tumor development. 

      The reviewer is correct in that hGFAP-Cre also targets other cell types, such as cerebellar glial cells, which are generated when Cre-expression has begun. It is possible that cerebellar glial cell development is also compromised in Sufu-cKO mice and may disrupt neuron-glial interaction, due to or independently of FGF signaling. In-depth studies are required to interrogate how loss of SUFU specifically affect development of cerebellar glial cells and influence their cellular interactions in the developing cerebellum.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by McKim et al seeks to provide a comprehensive description of the connectivity of neurosecretory cells (NSCs) using a high-resolution electron microscopy dataset of the fly brain and several single-cell RNA seq transcriptomic datasets from the brain and peripheral tissues of the fly. They use connectomic analyses to identify discrete functional subgroups of NSCs and describe both the broad architecture of the synaptic inputs to these subgroups as well as some of the specific inputs including from chemosensory pathways. They then demonstrate that NSCs have very few traditional presynapses consistent with their known function as providing paracrine release of neuropeptides. Acknowledging that EM datasets can't account for paracrine release, the authors use several scRNAseq datasets to explore signaling between NSCs and characterize widespread patterns of neuropeptide receptor expression across the brain and several body tissues. The thoroughness of this study allows it to largely achieve it's goal and provides a useful resource for anyone studying neurohormonal signaling.

      Strengths:

      The strengths of this study are the thorough nature of the approach and the integration of several large-scale datasets to address short-comings of individual datasets. The study also acknowledges the limitations that are inherent to studying hormonal signaling and provides interpretations within the the context of these limitations.

      Weaknesses:

      Overall, the framing of this paper needs to be shifted from statements of what was done to what was found. Each subsection, and the narrative within each, is framed on topics such as "synaptic output pathways from NSC" when there are clear and impactful findings such as "NSCs have sparse synaptic output". Framing the manuscript in this way allows the reader to identify broad takeaways that are applicable to other model system. Otherwise, the manuscript risks being encyclopedic in nature. An overall synthesis of the results would help provide the larger context within which this study falls.

      We agree with the reviewer and will replace all the subsection titles as suggested.

      The cartoon schematic in Figure 5A (which is adapted from a 2020 review) has an error. This schematic depicts uniglomerular projection neurons of the antennal lobe projecting directly to the lateral horn (without synapsing in the mushroom bodies) and multiglomerular projection neurons projecting to the mushroom bodies and then lateral horn. This should be reversed (uniglomerular PNs synapse in the calyx and then further project to the LH and multiglomerular PNs project along the mlACT directly to the LH) and is nicely depicted in a Strutz et al 2014 publication in eLife.

      We thank the reviewer for spotting this error. We will modify the schematic as suggested.

      Reviewer #2 (Public review):

      Summary:

      The authors aim to provide a comprehensive description of the neurosecretory network in the adult Drosophila brain. They sought to assign and verify the types of 80 neurosecretory cells (NSCs) found in the publicly available FlyWire female brain connectome. They then describe the organization of synaptic inputs and outputs across NSC types and outline circuits by which olfaction may regulate NSCs, and by which Corazon-producing NSCs may regulate flight behavior. Leveraging existing transcriptomic data, they also describe the hormone and receptor expressions in the NSCs and suggest putative paracrine signaling between NSCs. Taken together, these analyses provide a framework for future experiments, which may demonstrate whether and how NSCs, and the circuits to which they belong, may shape physiological function or animal behavior.

      Strengths:

      This study uses the FlyWire female brain connectome (Dorkenwald et al. 2023) to assign putative cell types to the 80 neurosecretory cells (NSCs) based on clustering of synaptic connectivity and morphological features. The authors then verify type assignments for selected populations by matching cluster sizes to anatomical localization and cell counts using immunohistochemistry of neuropeptide expression and markers with known co-expression.

      The authors compare their findings to previous work describing the synaptic connectivity of the neurosecretory network in larval Drosophila (Huckesfeld et al., 2021), finding that there are some differences between these developmental stages. Direct comparisons between adults and larvae are made possible through direct comparison in Table 1, as well as the authors' choice to adopt similar (or equivalent) analyses and data visualizations in the present paper's figures.

      The authors extract core themes in NSC synaptic connectivity that speak to their function: different NSC types are downstream of shared presynaptic outputs, suggesting the possibility of joint or coordinated activation, depending on upstream activity. NSCs receive some but not all modalities of sensory input. NSCs have more synaptic inputs than outputs, suggesting they predominantly influence neuronal and whole-body physiology through paracrine and endocrine signaling.

      The authors outline synaptic pathways by which olfactory inputs may influence NSC activity and by which Corazon-releasing NSCs may regulate flight. These analyses provide a basis for future experiments, which may demonstrate whether and how such circuits shape physiological function or animal behavior.

      The authors extract expression patterns of neuropeptides and receptors across NSC cell types from existing transcriptomic data (Davie et al., 2018) and present the hypothesis that NSCs could be interconnected via paracrine signaling. The authors also catalog hormone receptor expression across tissues, drawing from the Fly Cell Atlas (Li et al., 2022).

      Weaknesses:

      The clustering of NSCs by their presynaptic inputs and morphological features, along with corroboration with their anatomical locations, distinguished some, but not all cell types. The authors attempt to distinguish cell types using additional methodologies: immunohistochemistry (Figure 2), retrograde trans-synaptic labeling, and characterization of dense core vesicle characteristics in the FlyWire dataset (Figure 1, Supplement 1). However, these corroborating experiments often lacked experimental replicates, were not rigorously quantified, and/or were presented as singular images from individual animals or even individual cells of interest. The assignments of DH44 and DMS types remain particularly unconvincing.

      We thank the reviewer for this comment. We would like to clarify that the images presented in Figure 2 and Figure 1 Supplement 1 are representative images based on at least 5 independent samples. We will clarify this in the figure caption and methods. The electron micrographs showing dense core vesicle (DCV) characteristics (Figure 1 Supplement E-G) are also representative images based on examination of multiple neurons. However, we agree with the reviewer that a rigorous quantification would be useful to showcase the differences between DCVs from NSC subtypes. Therefore, we have now performed a quantitative analysis of the DCVs in putative m-NSC<sup>DH44</sup> (n=6), putative m-NSC<sup>DMS</sup> (n=6) and descending neurons (n=4) known to express DMS. For consistency, we examined the cross section of each cell where the diameter of nuclei was the largest. We quantified the mean gray value of at least 50 DCV per cell. Our analysis shows that mean gray values of putative m-NSC<sup>DMS</sup> and DMS descending neurons are not significantly different, whereas the mean gray values of m-NSC<sup>DH44</sup> are significantly larger. This analysis is in agreement with our initial conclusion.

      Author response image 1.

      The authors present connectivity diagrams for visualization of putative paracrine signaling between NSCs based on their peptide and receptor expression patterns. These transcriptomic data alone are inadequate for drawing these conclusions, and these connectivity diagrams are untested hypotheses rather than results. The authors do discuss this in the Discussion section.

      We fully agree with the reviewer and will further elaborate on the limitations of our approach in the revised manuscript. However, there is a very high-likelihood that a given NSC subtype can signal to another NSC subtype using a neuropeptide if its receptor is expressed in the target NSC. This is due to the fact that all NSC axons are part of the same nerve bundle (nervi corpora cardiaca) which exits the brain. The axons of different NSCs form release sites that are extremely close to each other. Neuropeptides from these release sites can easily diffuse via the hemolymph to peripheral tissues that (e.g. fat body and ovaries) that are much further away from the release sites on neighboring NSCs. We believe that neuropeptide receptors are expressed in NSCs near these release sites where they can receive inputs not just from the adjacent NSCs but also from other sources such as the gut enteroendocrine cells. Hence, neuropeptide diffusion is not a limiting factor preventing paracrine signaling between NSCs and receptor expression is a good indicator for putative paracrine signaling.

      Reviewer #3 (Public review):

      Summary:

      The manuscript presents an ambitious and comprehensive synaptic connectome of neurosecretory cells (NSC) in the Drosophila brain, which highlights the neural circuits underlying hormonal regulation of physiology and behaviour. The authors use EM-based connectomics, retrograde tracing, and previously characterised single-cell transcriptomic data. The goal was to map the inputs to and outputs from NSCs, revealing novel interactions between sensory, motor, and neurosecretory systems. The results are of great value for the field of neuroendocrinology, with implications for understanding how hormonal signals integrate with brain function to coordinate physiology.

      The manuscript is well-written and provides novel insights into the neurosecretory connectome in the adult Drosophila brain. Some, additional behavioural experiments will significantly strengthen the conclusions.

      Strengths:

      (1) Rigorous anatomical analysis

      (2) Novel insights on the wiring logic of the neurosecretory cells.

      Weaknesses:

      (1) Functional validation of findings would greatly improve the manuscript.

      We agree with this reviewer that assessing the functional output from NSCs would improve the manuscript. Given that we currently lack genetic tools to measure hormone levels and that behaviors and physiology are modulated by NSCs on slow timescales, it is difficult to assess the immediate functional impact of the sensory inputs to NSC using approaches such as optogenetics. However, since l-NSC<sup>CRZ</sup> are the only known cell type that provide output to descending neurons, we will functionally test this output pathway using different behavioral assays recommended by this reviewer.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The authors present exciting new experimental data on the antigenic recognition of 78 H3N2 strains (from the beginning of the 2023 Northern Hemisphere season) against a set of 150 serum samples. The authors compare protection profiles of individual sera and find that the antigenic effect of amino acid substitutions at specific sites depends on the immune class of the sera, differentiating between children and adults. Person-to-person heterogeneity in the measured titers is strong, specifically in the group of children's sera. The authors find that the fraction of sera with low titers correlates with the inferred growth rate using maximum likelihood regression (MLR), a correlation that does not hold for pooled sera. The authors then measure the protection profile of the sera against historical vaccine strains and find that it can be explained by birth cohort for children. Finally, the authors present data comparing pre- and post- vaccination protection profiles for 39 (USA) and 8 (Australia) adults. The data shows a cohort-specific vaccination effect as measured by the average titer increase, and also a virus-specific vaccination effect for the historical vaccine strains. The generated data is shared by the authors and they also note that these methods can be applied to inform the bi-annual vaccine composition meetings, which could be highly valuable.

      Thanks for this nice summary of our paper.

      The following points could be addressed in a revision:

      (1) The authors conclude that much of the person-to-person and strain-to-strain variation seems idiosyncratic to individual sera rather than age groups. This point is not yet fully convincing. While the mean titer of an individual may be idiosyncratic to the individual sera, the strain-to-strain variation still reveals some patterns that are consistent across individuals (the authors note the effects of substitutions at sites 145 and 275/276). A more detailed analysis, removing the individual-specific mean titer, could still show shared patterns in groups of individuals that are not necessarily defined by the birth cohort.

      As the reviewer suggests, we normalized the titers for all sera to the geometric mean titer for each individual in the US-based pre-vaccination adults and children. This is only for the 2023-circulating viral strains. We then faceted these normalized titers by the same age groups we used in Figure 6, and the resulting plot is shown below. Although there are differences among virus strains (some are better neutralized than others), there are not obvious age group-specific patterns (eg, the trends in the two facets are similar). To us this suggests that at least for these relatively closely related recent H3N2 strains, the strain-to-strain variation does not obviously segregate by age group. Obviously, it is possible (we think likely) that there would be more obvious age-group specific trends if we looked at a larger swath of viral strains covering a longer time range (eg, over decades of influenza evolution). We plan to add the new plots shown below to a supplemental figure in the revised manuscript.

      Author response image 1.

      Author response image 2.

      (2) The authors show that the fraction of sera with a titer below 138 correlates strongly with the inferred growth rate using MLR. However, the authors also note that there exists a strong correlation between the MLR growth rate and the number of HA1 mutations. This analysis does not yet show that the titers provide substantially more information about the evolutionary success. The actual relation between the measured titers and fitness is certainly more subtle than suggested by the correlation plot in Figure 5. For example, the clades A/Massachusetts and A/Sydney both have a positive fitness at the beginning of 2023, but A/Massachusetts has substantially higher relative fitness than A/Sydney. The growth inference in Figure 5b does not appear to map that difference, and the antigenic data would give the opposite ranking. Similarly, the clades A/Massachusetts and A/Ontario have both positive relative fitness, as correctly identified by the antigenic ranking, but at quite different times (i.e., in different contexts of competing clades). Other clades, like A/St. Petersburg are assigned high growth and high escape but remain at low frequency throughout. Some mention of these effects not mapped by the analysis may be appropriate.

      Thanks for the nice summary of our findings in Figure 5. However, the reviewer is misreading the growth charts when they say that A/Massachusetts/18/2022 has a substantially higher fitness than A/Sydney/332/2023. Figure 5a shows the frequency trajectory of different variants over time. While A/Massachusetts/18/2022 reaches a higher frequency than A/Sydney/332/2023, the trajectory is similar and the reason that A/Massachusetts/18/2022 reached a higher max frequency is that it started at a higher frequency at the beginning of 2023. The MLR growth rate estimates differ from the maximum absolute frequency reached: instead, they reflect how rapidly each strain grows relative to others. In fact, A/Massachusetts/18/2022 and A/Sydney/332/2023 have similar growth rates, as shown in Supplementary Figure 6b. Similarly, A/Saint-Petersburg/RII-166/2023 starts at a low initial frequency but then grows even as A/Massachusetts/18/2022 and A/Sydney/332/2023 are declining, and so has a higher growth rate than both of those. In the revised manuscript, we will clarify how viral growth rates are estimated from frequency trajectories, and how growth rate differs from max frequency.

      (3) For the protection profile against the vaccine strains, the authors find for the adult cohort that the highest titer is always against the oldest vaccine strain tested, which is A/Texas/50/2012. However, the adult sera do not show an increase in titer towards older strains, but only a peak at A/Texas. Therefore, it could be that this is a virus-specific effect, rather than a property of the protection profile. Could the authors test with one older vaccine virus (A/Perth/16/2009?) whether this really can be a general property?

      We are interested in studying immune imprinting more thoroughly using sequencing-based neutralization assays, but we note that the adults in the cohorts we studied would have been imprinted with much older strains than included in this library. As this paper focuses on the relative fitness of contemporary strains with minor secondary points regarding imprinting, these experiments are beyond the scope of this study. We’re excited for future work (from our group or others) to explore these points by making a new virus library with strains from multiple decades of influenza evolution.

      Reviewer #2 (Public review):

      This is an excellent paper. The ability to measure the immune response to multiple viruses in parallel is a major advancement for the field, which will be relevant across pathogens (assuming the assay can be appropriately adapted). I only have a few comments, focused on maximising the information provided by the sera.

      Thanks very much!

      Firstly, one of the major findings is that there is wide heterogeneity in responses across individuals. However, we could expect that individuals' responses should be at least correlated across the viruses considered, especially when individuals are of a similar age. It would be interesting to quantify the correlation in responses as a function of the difference in ages between pairs of individuals. I am also left wondering what the potential drivers of the differences in responses are, with age being presumably key. It would be interesting to explore individual factors associated with responses to specific viruses (beyond simply comparing adults versus children).

      We’re excited by this idea! We plan to include these analyses in our revised pre-print.

      Relatedly, is the phylogenetic distance between pairs of viruses associated with similarity in responses?

      As above, we like this idea and our revised pre-print will include this analysis.

      Figure 5C is also a really interesting result. To be able to predict growth rates based on titers in the sera is fascinating. As touched upon in the discussion, I suspect it is really dependent on the representativeness of the sera of the population (so, e.g., if only elderly individuals provided sera, it would be a different result than if only children provided samples). It may be interesting to compare different hypotheses - so e.g., see if a population-weighted titer is even better correlated with fitness - so the contribution from each individual's titer is linked to a number of individuals of that age in the population. Alternatively, maybe only the titers in younger individuals are most relevant to fitness, etc.

      We’re very interested in these analyses, but suggest they may be better explored in subsequent works that could sample more children, teenagers and adults across age groups. Our sera set, as the reviewer suggests, may be under-powered to perform the proposed analysis on subsetted age groups of our larger age cohorts.

      In Figure 6, the authors lump together individuals within 10-year age categories - however, this is potentially throwing away the nuances of what is happening at individual ages, especially for the children, where the measured viruses cross different groups. I realise the numbers are small and the viruses only come from a small numbers of years, however, it may be preferable to order all the individuals by age (y-axis) and the viral responses in ascending order (x-axis) and plot the response as a heatmap. As currently plotted, it is difficult to compare across panels

      This is a good suggestion, and a revised pre-print will include heatmaps of the different cohorts, ordered by ages of individuals.

      Reviewer #3 (Public review):

      The authors use high-throughput neutralisation data to explore how different summary statistics for population immune responses relate to strain success, as measured by growth rate during the 2023 season. The question of how serological measurements relate to epidemic growth is an important one, and I thought the authors present a thoughtful analysis tackling this question, with some clear figures. In particular, they found that stratifying the population based on the magnitude of their antibody titres correlates more with strain growth than using measurements derived from pooled serum data. However, there are some areas where I thought the work could be more strongly motivated and linked together. In particular, how the vaccine responses in US and Australia in Figures 6-7 relate to the earlier analysis around growth rates, and what we would expect the relationship between growth rate and population immunity to be based on epidemic theory.

      Thank you for this nice summary. This reviewer also notes that the text related to figures 6 and 7 are more secondary to the main story presented in figures 3-5. The main motivation for including figures 6 and 7 were to demonstrate the wide-ranging applications of sequencing-based neutralization data, and this can certainly be clarified in minor text revisions.

    1. Author Response

      Public Reviews

      We thank both reviewers for taking the time and effort to think critically about our paper and point out areas where it can be improved. In this document, we do our best to clarify any misunderstandings with the hope that further consideration about the strengths and weaknesses of our approach will be possible. Our responses are in bold.

      Reviewer #1 (Public Review):

      Summary:

      In their manuscript, Schmidlin, Apodaca, et al try to answer fundamental questions about the evolution of new phenotypes and the trade-offs associated with this process. As a model, they use yeast resistance to two drugs, fluconazole and radicicol. They use barcoded libraries of isogenic yeasts to evolve thousands of strains in 12 different environments. They then measure the fitness of evolved strains in all environments and use these measurements to examine patterns in fitness trade-offs. They identify only six major clusters corresponding to different trade-off profiles, suggesting the vast genotypic landscape of evolved mutants translates to a highly constrained phenotypic space. They sequence over a hundred evolved strains and find that mutations in the same gene can result in different phenotypic profiles.

      Overall, the authors deploy innovative methods to scale up experimental evolution experiments, and in many aspects of their approach tried to minimize experimental variation.

      We thank the reviewer for this positive assessment of our work. We are happy that the reviewer noted what we feel is a unique strength of our approach: we scaled up experimental evolution by using DNA barcodes and by exploring 12 related selection pressures. Despite this scaling up, we still see phenotypic convergence among the 744 adaptive mutants we study.

      The environments we study represent 12 different concentrations or combinations of two drugs, radicicol and fluconazole. Our hope is that this large dataset (774 mutants x 12 environments) will be useful, both to scientists who are generally interested in the genetic and phenotypic underpinnings of adaptation, and to scientists specifically interested in the evolution of drug resistance.

      Weaknesses:

      (1) One of the objectives of the authors is to characterize the extent of phenotypic diversity in terms of resistance trade-offs between fluconazole and radicicol. To minimize noise in the measurement of relative fitness, the authors only included strains with at least 500 barcode counts across all time points in all 12 experimental conditions, resulting in a set of 774 lineages passing this threshold. This corresponds to a very small fraction of the starting set of ~21 000 lineages that were combined after experimental evolution for fitness measurements.

      This is a misunderstanding that we will work to clarify in the revision. Our starting set did not include 21,000 adaptive lineages. The total number of unique adaptive lineages in this starting set is much lower than 21,000 for two reasons.

      First, ~21,000 represents the number of single colonies we isolated in total from our evolution experiments. Many of these isolates possess the same barcode, meaning they are duplicates. Second, and more importantly, most evolved lineages do not acquire adaptive mutations, meaning that many of the 21,000 isolates are genetically identical to their ancestor. In our revised manuscript, we will explicitly state that these 21,000 isolated lineages do not all represent unique, adaptive lineages. In figure 2 and all associated text, we will change the word “lineages” to “isolates,” where relevant.

      More broadly speaking, several previous studies have demonstrated that diverse genetic mutations converge at the level of phenotype, and have suggested that this convergence makes adaptation more predictable (PMID33263280, PMID37437111, PMID22282810, PMID25806684). Our study captures mutants that are overlooked in previous studies, such as those that emerge across subtly different selection pressures (e.g., 4 𝜇g/ml vs. 8 𝜇g/ml flu) and those that are undetectable in evolutions lacking DNA barcodes. Thus, while our experimental design misses some mutants (see next comment), it captures many others. Note that 774 adaptive lineages is more than most previous studies. Thus, we feel that “our work – showing that 774 mutants fall into a much smaller number of groups” is important because it “contributes to growing literature suggesting that the phenotypic basis of adaptation is not as diverse as the genetic basis (lines 161 - 162).”

      As the authors briefly remark, this will bias their datasets for lineages with high fitness in all 12 environments, as all these strains must be fit enough to maintain a high abundance.

      The word “briefly” feels a bit unfair because we discuss this bias on 3 separate occasions (on lines 146 - 147, 260 - 264, and in more detail on 706 - 714). We even walk through an example of a class of mutants that our study misses. We say, “our study is underpowered to detect adaptive lineages that have low fitness in any of the 12 environments. This is bound to exclude large numbers of adaptive mutants. For example, previous work has shown some FLU resistant mutants have strong tradeoffs in RAD (Cowen and Lindquist 2005). Perhaps we are unable to detect these mutants because their barcodes are at too low a frequency in RAD environments, thus they are excluded from our collection of 774.”

      In our revised version, we will add more text to the first mention of these missing mutants (lines 146 - 147) so that the implications are more immediately made apparent.

      While we “miss” some classes of mutants, we “catch” other classes that may have been missed in previous studies of convergence. For example, we observe a unique class of FLU-resistant mutants that primarily emerged in evolution experiments that lack FLU (Figure 3). Thus, we think that the unique design of our study, surveying 12 environments, allows us to make a novel contribution to the study of phenotypic convergence.

      One of the main observations of the authors is phenotypic space is constrained to a few clusters of roughly similar relative fitness patterns, giving hope that such clusters could be enumerated and considered to design antimicrobial treatment strategies. However, by excluding all lineages that fit in only one or a few environments, they conceal much of the diversity that might exist in terms of trade-offs and set up an inclusion threshold that might present only a small fraction of phenotypic space with characteristics consistent with generalist resistance mechanisms or broadly increased fitness. This has important implications regarding the general conclusions of the authors regarding the evolution of trade-offs.

      We discussed these implications in some detail in the 16 lines mentioned above (146 - 147, 260 - 264, 706 - 714). To add to this discussion, we will also add the following sentence to the end of the paragraph on lines 697 - 714: “This could complicate (or even make impossible) endeavors to design antimicrobial treatment strategies that thwart resistance”.

      We will also add a new paragraph that discusses these implications earlier in our manuscript. This paragraph will highlight the strengths of our method (e.g., that we “catch” classes of mutants that are often overlooked) while being transparent about the weaknesses of our approach (e.g., that we “miss” mutants with strong tradeoffs).

      (2) Most large-scale pooled competition assays using barcodes are usually stopped after ~25 to avoid noise due to the emergence of secondary mutations.

      The rate at which new mutations enter a population is driven by various factors such as the mutation rate and population size, so choosing an arbitrary threshold like 25 generations is difficult.

      We conducted our fitness competition following previous work using the Levy/Blundell yeast barcode system, in which the number of generations reported varies from 32 to 40 (PMID33263280, PMID27594428, PMID37861305, see PMID27594428 for detailed calculation of the fraction of lineages biased by secondary mutations in this system).

      The authors measure fitness across ~40 generations, which is almost the same number of generations as in the evolution experiment. This raises the possibility of secondary mutations biasing abundance values, which would not have been detected by the whole genome sequencing as it was performed before the competition assay.

      We understand how the reviewer came to this misunderstanding and will adjust our revised manuscript accordingly. Previous work has demonstrated that, in this particular evolution platform, most of the mutations actually occur during the transformation that introduces the DNA barcodes (PMID25731169). In other words, these mutations do not accumulate during the 40 generations of evolution, they are already there. So the observation that we collect a genetically diverse pool of adaptive mutants after 40 generations of evolution is not evidence that 40 generations is enough time for secondary mutations to bias abundance values.

      (3) The approach used by the authors to identify and visualize clusters of phenotypes among lineages does not seem to consider the uncertainty in the measurement of their relative fitness. As can be seen from Figure S4, the inter-replicate difference in measured fitness can often be quite large. From these graphs, it is also possible to see that some of the fitness measurements do not correlate linearly (ex.: Med Flu, Hi Rad Low Flu), meaning that taking the average of both replicates might not be the best approach.

      This concern, and all subsequent concerns, seem to be driven by either (a) general concerns about the noisiness of fitness measurements obtained from large-scale barcode fitness assays or (b) general concerns about whether the clusters obtained from our dimensional reduction approach capture this noise as opposed to biologically meaningful differences.

      We will respond to each concern point-by-point, but want to start by generally stating that (a) our particular large-scale barcode fitness assay has several features that diminish noise, and (b) we devote 4 figures and 200 lines of text to demonstrating that these clusters capture biologically meaningful differences between mutants (and not noise).

      In terms of this specific concern, we performed an analysis of noise in the submitted manuscript: Our noisiest fitness measurements correspond to barcodes that are the least abundant and thus suffer the most from stochastic sampling noise. These are also the barcodes that introduce the nonlinearity the reviewer mentions. We removed these from our dataset by increasing our coverage threshold from 500 reads to 5,000 reads. The clusters did not collapse, which suggests that they were not capturing noise (Figure S7 panel B). But we agree with the reviewer that this analysis alone is not sufficient to conclude that the clusters distinguish groups of mutants with unique fitness tradeoffs.

      Because the clustering approach used does not seem to take this variability into account, it becomes difficult to evaluate the strength of the clustering, especially because the UMAP projection does not include any representation of uncertainty around the position of lineages.

      To evaluate the strength of the clustering, we performed numerous analyses including whole genome sequencing, growth experiments, reclustering, and tracing the evolutionary origins of each cluster (Figures 5 - 8). All of these analyses suggested that our clusters capture groups of mutants that have different fitness tradeoffs. We will adjust our revised manuscript to make clear that we do not rely on the results of a clustering algorithm alone to draw conclusions about phenotypic convergence.

      We are also grateful to the reviewer for helping us realize that, as written, our manuscript is not clear with regard to how we perform clustering. We are not using UMAP to decide which mutant belongs to which cluster. Recent work highlights the importance of using an independent clustering method (PMID37590228). Although this recent work addresses the challenge of clustering much higher dimensional data than we survey here, we did indeed use an independent clustering method (gaussian mixture model). In other words, we use UMAP for visualization but not clustering. We also confirm our clustering results using a second independent method (hierarchical clustering; Figure S8). And in our revised manuscript, will confirm with a third method (PCA, see below). We will adjust the main text and the methods section to make these choices clearer.

      This might paint a misleading picture where clusters appear well separate and well defined but are in fact much fuzzier, which would impact the conclusion that the phenotypic space is constricted.

      The salient question is whether the clusters are so “fuzzy” that they are not meaningful. That interpretation seems unreasonable. Our clusters group mutants with similar genotypes, evolutionary histories, and fitness tradeoffs (Figures 5 - 8). Clustering mutants with similar behaviors is important and useful. It improves phenotypic prediction by revealing which mutants are likely to have at least some phenotypic effects in common. And it also suggests that the phenotypic space is constrained, at least to some degree, which previous work suggests is helpful in predicting evolution (PMID33263280, PMID37437111, PMID22282810, PMID25806684).

      (4) The authors make the decision to use UMAP and a gaussian mixed model to cluster and represent the different fitness landscapes of their lineages of interest. Their approach has many caveats. First, compared to PCA, the axis does not provide any information about the actual dissimilarities between clusters. Using PCA would have allowed a better understanding of the amount of variance explained by components that separate clusters, as well as more interpretable components.

      The components derived from PCA are often not interpretable. It’s not obvious that each one, or even the first one, will represent some intuitive phenotype, like resistance to fluconazole.

      Moreover, we see many non-linearities in our data. For example, fitness in a double drug environment is not predicted by adding up fitness in the relevant single drug environments. Also, there are mutants that have high fitness when fluconazole is absent or abundant, but low fitness when mild concentrations are present. These types of nonlinearities can make the axes in PCA very difficult to interpret, plus these nonlinearities can be missed by PCA, thus we prefer other clustering methods.

      We will adjust our revised manuscript to explain these reasons why we chose UMAP and GMM over PCA.

      Also, we will include PCA in the supplement of our revised manuscript. Please find below PC1 vs PC2, with points colored according to the cluster assignment in figure 4 (i.e. using a gaussian mixture model). It appears the clusters are largely preserved.

      Author response image 1.

      Second, the advantages of dimensional reduction are not clear. In the competition experiment, 11/12 conditions (all but the no drug, no DMSO conditions) can be mapped to only three dimensions: concentration of fluconazole, concentration of radicicol, and relative fitness. Each lineage would have its own fitness landscape as defined by the plane formed by relative fitness values in this space, which can then be examined and compared between lineages.

      We worry that the idea stems from apriori notions of what the important dimensions should be. It also seems like this would miss important nonlinearities such as our observation that low fluconazole behaves more like a novel selection pressure than a dialed down version of high fluconazole.

      Also, we believe the reviewer meant “fitness profile” and not “fitness landscape”. A fitness landscape imagines a walk where every “step” is a mutation. Most lineages in barcoded evolution experiments possess only a single adaptive mutation. A single-step walk is not enough to build a landscape, though others are expanding barcoded evolution experiments beyond the first step (PMID34465770, PMID31723263), so maybe one day this will be possible.

      Third, the choice of 7 clusters as the cutoff for the multiple Gaussian model is not well explained. Based on Figure S6A, BIC starts leveling off at 6 clusters, not 7, and going to 8 clusters would provide the same reduction as going from 6 to 7. This choice also appears arbitrary in Figure S6B, where BIC levels off at 9 clusters when only highly abundant lineages are considered.

      We agree. We did not rely on the results of BIC alone to make final decisions about how many clusters to include. We thank the reviewer for pointing out this gap in our writing. We will adjust our revised manuscript to explain that we ultimately chose to describe 6 clusters that we were able to validate with follow-up experiments. In figures 5, 6, 7, and 8, we use external information to validate the clusters that we report in figure 4. And in lines 697 – 714, we explain that there are may be additional clusters beyond those we tease apart in this study.

      This directly contradicts the statement in the main text that clusters are robust to noise, as more a stringent inclusion threshold appears to increase and not decrease the optimal number of clusters. Additional criteria to BIC could have been used to help choose the optimal number of clusters or even if mixed Gaussian modeling is appropriate for this dataset.

      We are under the following impression: If our clustering method was overfitting, i.e. capturing noise, the optimal number of clusters should decrease when we eliminate noise. It increased. In other words, the observation that our clusters did not collapse (i.e. merge) when we removed noise suggests these clusters were not capturing noise.

      More generally, our validation experiments, described below, provide additional evidence that our clusters capture meaningful differences between mutants (and not noise).

      (5) Large-scale barcode sequencing assays can often be noisy and are generally validated using growth curves or competition assays.

      Some types of bar-seq methods, in particular those that look at fold change across two time points, are noisier than others that look at how frequency changes across multiple timepoints (PMID30391162). Here, we use the less noisy method. We also reduce noise by using a stricter coverage threshold than previous work (e.g., PMID33263280), and by excluding batch effects by performing all experiments simultaneously (PMID37237236).

      The main assay we use to measure fitness has been previously validated (PMID27594428). No subsequent study using this assay validates using the methods suggested by the reviewer (see PMID37861305, PMID33263280, PMID31611676, PMID29429618, PMID37192196, PMID34465770, PMID33493203).

      More to the point, bar-seq has been used, without the reviewer’s suggested validation, to demonstrate that the way some mutant’s fitness changes across environments is different from other mutants (PMID33263280, PMID37861305, PMID31611676, PMID33493203, PMID34596043). This is the same thing that we use bar-seq to demonstrate.

      For all of these reasons, we are hesitant to confirm bar-seq itself as a valid way to infer fitness. It seems this is already accepted as a standard in our field.

      Having these types of results would help support the accuracy of the main assay in the manuscript and thus better support the claims of the authors.

      We don’t agree that fitness measurements obtained from this bar-seq assay generally require validation. But we do agree that it is important to validate whether the mutants in each of our 6 clusters indeed are different from one another in meaningful ways, in particular, in that they have different fitness tradeoffs. We have four figures (5 - 8) and 200 lines of text dedicated to validating whether our clusters capture reproducible and biologically meaningful differences between mutants. Happily, one of these figures (Fig 7) includes growth curves, which are exactly the type of validation experiment asked for by the reviewer.

      Below, we walk through the different types of validation experiments that are present in our original manuscript, and additional validation experiments that we plan to include in the revised version. We are hopeful that these validation experiments are sufficient, or at the very least, that this list empowers reviewers to point out where more work is needed.

      (1) Mutants from different clusters have different growth curves: In our original manuscript, we measured growth curves corresponding to a fitness tradeoff that we thought was surprising. Mutants in clusters 4 and 5 both have fitness advantages in single drug conditions. While mutants from cluster 4 also are advantageous in the double drug conditions, mutants from cluster 5 are not! We validated these different behaviors by studying growth curves for a mutant from each cluster (Figures 7 and S10).

      (2) Mutants from different clusters have different evolutionary origins: In our original manuscript, we came up with a novel way to ask whether the clusters capture different types of adaptive mutants. We asked whether the mutants in each cluster originate from different evolution experiments. Indeed they often do (see pie charts in Figures 6, 7, 8). This method also provides evidence supporting each cluster’s differing fitness tradeoffs.

      For example, mutants in cluster 5 appear to have a tradeoff in a double drug condition (described above). They rarely originate from that evolution condition, unlike mutants in nearby cluster 4 (see Figure 7).

      (3) Mutants from each cluster often fall into different genes: In our original manuscript, we sequenced many of these mutants and show that mutants in the same gene are often found in the same cluster. For example, all 3 IRA1 mutants are in cluster 6 (Fig 8), both GPB2 mutants are in cluster 4 (Figs 7 & 8), and 35/36 PDR mutants are in either cluster 2 or 3 (Figs 5 & 6).

      (4) Mutants from each cluster have behaviors previously observed in the literature: In our original manuscript, we compared our sequencing results to the literature and found congruence. For example, PDR mutants are known to provide a fitness benefit in fluconazole and are found in clusters that have high fitness in fluconazole (lines 457 - 462). Previous work suggests that some mutations to PDR have different tradeoffs than others, which is what we see (lines 540 - 542). IRA1 mutants were previously observed to have high fitness in our “no drug” condition, and are found in the cluster that has the highest fitness in the “no drug” condition (lines 642 - 646). Previous work even confirms the unusual fitness tradeoff we observe where IRA1 and other cluster 6 mutants have low fitness only in low concentrations of fluconazole (lines 652 - 657).

      (5) Mutants largely remain in their clusters when we use alternate clustering methods: In our original manuscript, we performed various different reclustering and/or normalization approaches on our data (Fig 6, S5, S7, S8, S9). The clusters of mutants that we observe in figure 4 do not change substantially when we recluster the data. We will add PCA (see above) to these analyses in our revised manuscript.

      (6) We will include additional data showing that mutants in different clusters have different evolutionary origins: Cluster 1 is defined by high fitness in low fluconazole that declines with increasing fluconazole (see Fig 4E and Fig 5C). In our revised manuscript, we will show that cluster 1 lineages were overwhelmingly sampled from evolutions conducted in our lowest concentration of fluconazole (see figure panel A below). No other cluster’s evolutionary history shows this pattern (figures 6, 7, and 8).

      (7) We will include additional data showing that mutants in different clusters have different growth curves: Cluster 1 lineages are unique in that their fitness advantage is specific to low flu and trades off in higher concentrations of fluconazole. We obtained growth curves for three cluster 1 mutants (2 SUR1 mutants and 1 UPC2 mutant). We compared them to growth curves for three PDR mutants (from clusters 2 and 3). Cluster 1 mutants appear to have the highest growth rates and reach the higher carrying capacity in low fluconazole (see red and green lines in Author response image 2 panel B below). But the cluster 1 mutants are negatively affected by higher concentrations of fluconazole, much more so than the mutants from clusters 2 and 3 (see Author response image 2 panel C below). This is consistent with the different fitness tradeoffs we observe for each cluster (figures 4 and 5). We will include a more detailed version of this analysis and the figures below in our revised manuscript.

      Author response image 2.

      Validation experiments demonstrate that cluster 1 mutants have uniquely high fitness in only the lowest concentration of fluconazole. (A) The mutant lineages in cluster 1 were largely sampled from evolution experiments performed in low flu. This is not true of other clusters (see pie charts in main manuscript). (B) In low flu (4 𝜇g/ml), Cluster 1 lineages (red/UPC2 and green/SUR1) grow faster and achieve higher density than lineages from clusters 2 and 3 (blue/PDR). This is consistent with barseq measurements demonstrating that cluster 1 mutants have the highest fitness in low flu. (C) Cluster 1 lineages are sensitive to increasing flu concentrations (SUR1 and UPC2 mutants, middle and rightmost graphs). This is apparent in that the gray (8 𝜇g/ml flu) and light blue (32 𝜇g/ml flu) growth curves rise more slowly and reach lower density than the dark blue curves (4 𝜇g/ml flu). But this is not the case for the PDR mutants from clusters 2 and 3 (leftmost graph). These observations are consistent with the bar-seq fitness data presented in the main manuscript (Fig 4E).

      With all of these validation efforts combined, we are hopeful that the reviewer is now more convinced that our clusters capture groups of mutants with different fitness tradeoffs (as opposed to noise). We want to conclude by saying that we are grateful to the reviewer for making us think deeply about areas where we can include additional validation efforts as well as areas where we can make our manuscript clearer.

      Reviewer #2 (Public Review):

      Summary:

      Schmidlin & Apodaca et al. aim to distinguish mutants that resist drugs via different mechanisms by examining fitness tradeoffs across hundreds of fluconazole-resistant yeast strains. They barcoded a collection of fluconazole-resistant isolates and evolved them in different environments with a view to having relevance for evolutionary theory, medicine, and genotypephenotype mapping.

      Strengths:

      There are multiple strengths to this paper, the first of which is pointing out how much work has gone into it; the quality of the experiments (the thought process, the data, the figures) is excellent. Here, the authors seek to induce mutations in multiple environments, which is a really large-scale task. I particularly like the attention paid to isolates with are resistant to low concentrations of FLU. So often these are overlooked in favour of those conferring MIC values >64/128 etc. What was seen is different genotype and fitness profiles. I think there's a wealth of information here that will actually be of interest to more than just the fields mentioned (evolutionary medicine/theory).

      We are very grateful for this positive review. This was indeed a lot of work! We are happy that the reviewer noted what we feel is a unique strength of our manuscript: that we survey adaptive isolates across multiple environments, including low drug concentrations.

      Weaknesses:

      Not picking up low fitness lineages - which the authors discuss and provide a rationale as to why. I can completely see how this has occurred during this research, and whilst it is a shame I do not think this takes away from the findings of this paper. Maybe in the next one!

      We thank the reviewer for these words of encouragement and will work towards catching more low fitness lineages in our next project.

      In the abstract the authors focus on 'tradeoffs' yet in the discussion they say the purpose of the study is to see how many different mechanisms of FLU resistance may exist (lines 679-680), followed up by "We distinguish mutants that likely act via different mechanisms by identifying those with different fitness tradeoffs across 12 environments". Whilst I do see their point, and this is entirely feasible, I would like a bit more explanation around this (perhaps in the intro) to help lay-readers make this jump. The remainder of my comments on 'weaknesses' are relatively fixable, I think:

      We think that phrasing the “jump” as a question might help lay readers get from point A to point B. So, in the introduction of our revised manuscript, we will add a paragraph roughly similar to this one: “If two groups of drug-resistant mutants have different fitness tradeoffs, does it mean that they provide resistance through different underlying mechanisms? Alternatively, it could mean that both provide drug resistance via the same mechanism, but some mutations come with a cost that others don’t pay. However, another way to phrase this alternative is to say that both groups of mutants affect fitness through different suites of mechanisms that are only partially overlapping. And so, by identifying groups of mutants with different fitness tradeoffs, we argue that we will be uncovering sets of mutations that impact fitness through different underlying mechanisms. The ability to do so would be useful for genotype-phenotype mapping endeavors.”

      In the introduction I struggle to see how this body of research fits in with the current literature, as the literature cited is a hodge-podge of bacterial and fungal evolution studies, which are very different! So example, the authors state "previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms" (lines 129-131) and then cite three papers, only one of which is a fungal research output. However, the next sentence focuses solely on literature from fungal research. Citing bacterial work as a foundation is fine, but as you're using yeast for this I think tailoring the introduction more to what is and isn't known in fungi would be more appropriate. It would also be great to then circle back around and mention monotherapy vs combination drug therapy for fungal infections as a rationale for this study. The study seems to be focused on FLU-resistant mutants, which is the first-line drug of choice, but many (yeast) infections have acquired resistance to this and combination therapy is the norm.

      In our revised manuscript, we will carefully review all citations. The issue may stem from our attempt to reach two different groups of scientists. We ourselves are broadly interested in the structure of the genotype-phenotype-fitness map (PMID33263280, PMID32804946). Though the 3 papers the reviewer mentions on lines 132 - 133 all pertain to yeast, we cite them because they are studies about the complexity of this map. Their conclusions, in theory, should apply broadly, beyond yeast. Similarly, the reason we cite papers from yeast, as well as bacteria and cancer, is that we believe general conclusions about the genotype-phenotype-fitness map should apply broadly. For example, the sentence the reviewer highlights, “previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms” is a general observation about the way genotype maps to fitness. So we cited papers from across the tree of life to support this sentence.

      On the other hand, because we study drug resistant mutations, we also hope that our work is of use to scientists studying the evolution of resistance. We agree with the reviewer that in this regard, some of our findings may be especially pertinent to the evolution of resistance to antifungal drugs. We will consider this when reviewing the citations in our revised manuscript and add some text to clarify these points.

      Methods: Line 769 - which yeast? I haven't even seen mention of which species is being used in this study; different yeast employ different mechanisms of adaptation for resistance, so could greatly impact the results seen. This could help with some background context if the species is mentioned (although I assume S. cerevisiae).

      In the revised manuscript, we will make clear that we study S. cerevisiae.

      In which case, should aneuploidy be considered as a mechanism? This is mentioned briefly on line 556, but with all the sequencing data acquired this could be checked quickly?

      We like this idea and we are working on it, but it is not straightforward. The reviewer is correct in that we can use the sequencing data that we already have. But calling aneuploidy with certainty is tough because its signal can be masked by noise. In other words, some regions of the genome may be sequenced more than others by chance. Given this is not straightforward, at least not for us, this analysis will likely have to wait for a subsequent paper.

      I think the authors could be bolder and try and link this to other (pathogenic) yeasts. What are the implications of this work on say, Candida infections?

      Perhaps because our background lies in general study of the genotype-phenotype map, we did not want to make bold assertions about how our work might apply to pathogenic yeasts. But we see how this could be helpful and will add some discussion points about this. Specifically, we will discuss which of the genes and mutants we observe are also found in Candida. We will also investigate whether our observation that low fluconazole represents a seemingly unique challenge, not just a milder version of high fluconazole, has any corollary in the Candida literature.

    1. Author Response

      Reviewer 1 (Public Review):

      1. With respect to the predictions, the authors propose that the subjects, depending on their linguistic background and the length of the tone in a trial, can put forward one or two predictions. The first is a short-term prediction based on the statistics of the previous stimuli and identical for both groups (i.e. short tones are expected after long tones and vice versa). The second is a long-term prediction based on their linguistic background. According to the authors, after a short tone, Basque speakers will predict the beginning of a new phrasal chunk, and Spanish speakers will predict it after a long tone.

      In this way, when a short tone is omitted, Basque speakers would experience the violation of only one prediction (i.e. the short-term prediction), but Spanish speakers will experience the violation of two predictions (i.e. the short-term and long-term predictions), resulting in a higher amplitude MMN. The opposite would occur when a long tone is omitted. So, to recap, the authors propose that subjects will predict the alternation of tone durations (short-term predictions) and the beginning of new phrasal chunks (long-term predictions).

      The problem with this is that subjects are also likely to predict the completion of the current phrasal chunk. In speech, phrases are seldom left incomplete. In Spanish is very unlikely to hear a function-word that is not followed by a content-word (and the opposite happens in Basque). On the contrary, after the completion of a phrasal chunk, a speaker might stop talking and a silence might follow, instead of the beginning of a new phrasal chunk.

      Considering that the completion of a phrasal chunk is more likely than the beginning of a new one, the prior endowed to the participants by their linguistic background should make us expect a pattern of results actually opposite to the one reported here.

      Response: We acknowledge the plausibility of the hypothesis advanced by Reviewer #1. We would like to further clarify the rationale that led us to predict that the hypothesized long-term predictions should manifest at the onset of (and not within) a “phrasal chunk”. The hypothesis does not directly concern the probability of a short event to follow a long one (or the other way around), which to our knowledge has not been systematically quantified in previous cross-linguistic studies. Rather, it concerns how the auditory system forms higher-level auditory chunks based on the rhythmic properties of the native language, which is what the previous behavioral studies on perceptual grouping have addressed (e.g., Iversen 2008; Molnar et al. 2014; Molnar et al. 2016). When presented with sequences of two tones alternating in duration, Spanish speakers typically report perceiving the auditory stream as a repetition of short-long chunks separated by a pause, while speakers of Basque usually report the opposite long-short grouping bias. These results suggest that the auditory system performs a chunking operation by grouping pairs of tones into compressed, higher-level auditory units (often perceived as a single event). The way two constituent tones are combined depends on linguistic experience. Based on this background, we hypothesized the presence of (i) a short-term system that merely encodes a repetition of alternations rule and predicts transitions from one constituent tone to the other (a → b → a → b, etc.); (ii) a long-term system that encodes a repetition of concatenated alternations rule and predicts transitions from one high-level unit to the other (ab → ab, etc.). Under this view, we expect predictions based on the long-term system to be stronger at the onset of (rather than within) high-level units and therefore omissions of the first constituent tone to elicit larger responses than omissions of the second constituent tone.

      In other words, the omission of the onset tone would reflect the omission of the whole chunk. On the other hand, the omission of the internal tone would be better handled by the short-term system, involved in processing the low-level structure of our sequences.

      A similar concern was also raised by Reviewer #2. We will include the view proposed by Reviewer #1 and Reviewer #2 in the updated version of the manuscript.

      1. The authors report an interaction effect that modulates the amplitude of the omission response, but caveats make the interpretation of this effect somewhat uncertain. The authors report a widespread omission response, which resembles the classical mismatch response (in MEG) with strong activations in sensors over temporal regions. Instead, the interaction found is circumscribed to four sensors that do not overlap with the peaks of activation of the omission response.

      Response: We appreciate that all three reviewers agreed on the robustness of the data analysis pipeline. The approach employed to identify the presence of an interaction effect was indeed conservative, using a non-parametric test on combined gradiometers data, no a priori assumptions regarding the location of the effect, and small cluster thresholds (cfg.clusteralpha = 0.05) to enhance the likelihood of detecting highly localized clusters with large effect sizes. This approach led to the identification of the cluster illustrated in Figure 2c, where the interaction effect is evident. The fact that this interaction effect arises in a relatively small cluster of sensors does not alter its statistical robustness. The only partial overlap of the cluster with the activation peaks might simply reflect the fact that distinct sources contribute to the generation of the omission-MMN, which has been demonstrated in numerous prior studies (e.g., Zhang et al., 2018; Ross & Hamm, 2020).

      Furthermore, the boxplot in Figure 2E suggests that part of the interaction effect might be due to the presence of two outliers (if removed, the effect is no longer significant). Overall, it is possible that the reported interaction is driven by a main effect of omission type which the authors report, and find consistently only in the Basque group (showing a higher amplitude omission response for long tones than for short tones). Because of these points, it is difficult to interpret this interaction as a modulation of the omission response.

      Response: The two participants mentioned by Reviewer #1, despite being somewhat distant from the rest of the group, are not outliers according to the standard Tukey’s rule. As shown in Author response image 1 below, no participant fell outside the upper (Q3+1.5xIQR) and lower whiskers (Q1-1.5xIQR) of the boxplot.

      Author response image 1.

      The presence of a main effect of omission type does not impact the interpretation of the interaction, especially considering that these effects emerge over distinct clusters of channels.

      The code to generate Author response image 1 and the corresponding statistics have been added to the script “analysis_interaction_data.R” in the OSF folder (https://osf.io/6jep8/).

      It should also be noted that in the source analysis, the interaction only showed a trend in the left auditory cortex, but in its current version the manuscript does not report the statistics of such a trend.

      Response: Our interpretation of the results for the present study is mainly driven by the effect observed on sensor-level data, which is statistically robust. The source modeling analyses (in non-invasive electrophysiology) provide a possible model of the candidate brain sources driving the effect observed at the sensor level. The source showing the interactive effect in our study is the left auditory cortex. More details and statistics will be provided in the reviewed version of the manuscript.

      Reviewer #2 (Public Review):

      1. Despite the evidence provided on neural responses, the main conclusion of the study reflects a known behavioral effect on rhythmic sequence perceptual organization driven by linguistic background (Molnar et al. 2016, particularly). Also, the authors themselves provide a good review of the literature that evidences the influence of long-term priors in neural responses related to predictive activity. Thus, in my opinion, the strength of the statements the authors make on the novelty of the findings may be a bit far-fetched in some instances.

      Response: We will consider the suggestion of reviewer #2 for the new version of the manuscript. Overall, we believe that the novelty of the current study lies in bridging together findings from two research fields - basic auditory neuroscience and cross-linguistic research - to provide evidence for a predictive coding model in the auditory that uses long-term priors to make perceptual inferences.

      1. Albeit the paradigm is well designed, I fail to see the grounding of the hypotheses laid by the authors as framed under the predictive coding perspective. The study assumes that responses to an omission at the beginning of a perceptual rhythmic pattern will be stronger than at the end. I feel this is unjustified. If anything, omission responses should be larger when the gap occurs at the end of the pattern, as that would be where stronger expectations are placed: if in my language a short sound occurs after a long one, and I perceptually group tone sequences of alternating tone duration accordingly, when I hear a short sound I will expect a long one following; but after a long one, I don't necessarily need to expect a short one, as something else might occur.

      Response: A similar point was advanced by Reviewer #1. We tried to clarify our hypothesis (see above). We will consider including this interpretation in the updated version of the manuscript.

      1. In this regard, it is my opinion that what is reflected in the data may be better accounted for (or at least, additionally) by a different neural response to an omission depending on the phase of an underlying attentional rhythm (in terms of Large and Jones rhythmic attention theory, for instance) and putative underlying entrained oscillatory neural activity (in terms of Lakatos' studies, for instance). Certainly, the fact that the aligned phase may differ depending on linguistic background is very interesting and would reflect the known behavioral effect.

      Response: We thank the reviewer for this comment, which is indeed very pertinent. Below are some comments highlighting our thoughts on this.

      1) We will explore in more detail the possibility that the aligned phase may differ depending on linguistic background, which is indeed very interesting. However, we believe that even if a phase modulation by language experience is found, it would not negate the possibility that the group differences in the MMN are driven by different long-term predictions. Rather, since the hypothesized phase differences would be driven by long-term linguistic experience, phase entrainment may reflect a mechanism through which long-term predictions are carried. On this point, we agree with the Reviewer when says that “this view would not change the impact of the results but add depth to their interpretation”.

      2) Related to the point above: Despite evoked responses and oscillations are often considered distinct electrophysiological phenomena, current evidence suggests that these phenomena are interconnected (e.g., Studenova et al., 2023). In our view, the hypotheses that the MMN reflects differences in phase alignment and long-term prediction errors are not mutually exclusive.

      3) Despite the plausibility of the view proposed by reviewer #2, many studies in the auditory neuroscience literature putatively consider the MMN as an index of prediction error (e.g., Bendixen et al., 2012; Heilbron and Chait, 2018). There are good reasons to believe that also in our study the MMN reflects, at least in part, an error response.

      In the updated version of the manuscript, we will include a paragraph discussing the possibility that the reported group differences in the omission MMN might be partially accounted for by differences in neural entrainment to the rhythmic sound sequences.

      Reviewer #3 (Public Review):

      The main weaknesses are the strength of the effects and generalisability. The sample size is also relatively small by today's standards, with N=20 in each group. Furthermore, the crucial effects are all mostly in the .01>P<.05 range, such as the crucial interaction P=.03. It would be nice to see it replicated in the future, with more participants and other languages. It would also have been nice to see behavioural data that could be correlated with neural data to better understand the real-world consequences of the effect.

      Response: We appreciate the positive feedback from Reviewer #3. Concerning this weakness highlighted: we agree with Reviewer #3 that it would be nice to see this study replicated in the future with larger sample sizes and a behavioral counterpart. Overall, we hope this work will lead to more studies using cross-linguistic/cultural comparisons to assess the effect of experience on neural processing. In the context of the present study, we believe that the lack of behavioral data does not undermine the main findings of this study, given the careful selection of the participants and the well-known robustness of the perceptual grouping effect (e.g., Iversen 2008; Yoshida et al., 2010; Molnar et al. 2014; Molnar et al. 2016). As highlighted by Reviewer #2, having Spanish and Basque dominant “speakers as a sample equates that in Molnar et al. (2016), and thus overcomes the lack of direct behavioral evidence for a difference in rhythmic grouping across linguistic groups. Molnar et al. (2016)'s evidence on the behavioral effect is compelling, and the evidence on neural signatures provided by the present study aligns with it.”

      References

      1. Bendixen, A., SanMiguel, I., & Schröger, E. (2012). Early electrophysiological indicators for predictive processing in audition: a review. International Journal of Psychophysiology, 83(2), 120-131.

      2. Heilbron, M., & Chait, M. (2018). Great expectations: is there evidence for predictive coding in auditory cortex?. Neuroscience, 389, 54-73.

      3. Iversen, J. R., Patel, A. D., & Ohgushi, K. (2008). Perception of rhythmic grouping depends on auditory experience. The Journal of the Acoustical Society of America, 124(4), 2263-2271.

      4. Molnar, M., Lallier, M., & Carreiras, M. (2014). The amount of language exposure determines nonlinguistic tone grouping biases in infants from a bilingual environment. Language Learning, 64(s2), 45-64.

      5. Molnar, M., Carreiras, M., & Gervain, J. (2016). Language dominance shapes non-linguistic rhythmic grouping in bilinguals. Cognition, 152, 150-159.

      6. Ross, J. M., & Hamm, J. P. (2020). Cortical microcircuit mechanisms of mismatch negativity and its underlying subcomponents. Frontiers in Neural Circuits, 14, 13.

      7. Simon, J., Balla, V., & Winkler, I. (2019). Temporal boundary of auditory event formation: An electrophysiological marker. International Journal of Psychophysiology, 140, 53-61.

      8. Studenova, A. A., Forster, C., Engemann, D. A., Hensch, T., Sander, C., Mauche, N., ... & Nikulin, V. V. (2023). Event-related modulation of alpha rhythm explains the auditory P300 evoked response in EEG. bioRxiv, 2023-02.

      9. Yoshida, K. A., Iversen, J. R., Patel, A. D., Mazuka, R., Nito, H., Gervain, J., & Werker, J. F. (2010). The development of perceptual grouping biases in infancy: A Japanese-English cross-linguistic study. Cognition, 115(2), 356-361.

      10. Zhang, Y., Yan, F., Wang, L., Wang, Y., Wang, C., Wang, Q., & Huang, L. (2018). Cortical areas associated with mismatch negativity: A connectivity study using propofol anesthesia. Frontiers in Human Neuroscience, 12, 392.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This study presents a new Bayesian approach to estimate importation probabilities of malaria, combining epidemiological data, travel history, and genetic data through pairwise IBD estimates. Importation is an important factor challenging malaria elimination, especially in low-transmission settings. This paper focuses on Magude and Matutuine, two districts in southern Mozambique with very low malaria transmission. The results show isolation-by-distance in Mozambique, with genetic relatedness decreasing with distances larger than 100 km, and no spatial correlation for distances between 10 and 100 km. But again, strong spatial correlation in distances smaller than 10 km. They report high genetic relatedness between Matutuine and Inhambane, higher than between Matutuine and Magude. Inhambane is the main source of importation in Matutuine, accounting for 63.5% of imported cases. Magude, on the other hand, shows smaller importation and travel rates than Matutuine, as it is a rural area with less mobility. Additionally, they report higher levels of importation and travel in the dry season, when transmission is lower. Also, no association with importation was found for occupation, sex, and other factors. These data have practical implications for public health strategies aiming for malaria elimination, for example, testing and treating travelers from Matutuine in the dry season.

      Strengths:

      The strength of this study lies in the combination of different sources of data - epidemiological, travel, and genetic data - to estimate importation probabilities, and the statistical analyses.

      Weaknesses:

      The authors recognize the limitations related to sample size and the biases of travel reports.

      Thank you for your review and consideration. As mentioned, we state in the manuscript the limitations related to sample sizes and travel reports. We aim to continue this study with new prospective data, aiming to address these limitations.

      Reviewer #2 (Public review):

      Summary:

      Based on a detailed dataset, the authors present a novel Bayesian approach to classify malaria cases as either imported or locally acquired.

      Strengths:

      The proposed Bayesian approach for case classification is simple, well justified, and allows the integration of parasite genomics, travel history, and epidemiological data. The work is well-written, very organized, and brings important contributions both to malaria control efforts in Mozambique and to the scientific community. Understanding the origin of cases is essential for designing more effective control measures and elimination strategies.

      Weakness:

      While the authors aim to classify cases as imported or locally acquired, the work lacks a quantification of the contribution of each case type to overall transmission.

      The Bayesian rationale is sound and well justified; however, the formulation appears to present an inconsistency that is replicated in both the main text and the Supplementary Material.

      In fact, one of the questions that remains unanswered is the overall contribution of importation events to transmission in the areas. While the Bayesian classifier does not quantify this, our future analysis will focus on combining outbreak detection, genetic clustering and importation classification to quantify the contribution of imported cases to outbreak resurgence and to the overall transmission.

      Thank you for pointing out the inconsistency in the final formula. In fact, the final formula corresponds to P(I<sub>A</sub> | G), instead to i>P(I<sub>A</sub>), so:

      instead of

      We will correct this error in a new version of the manuscript.

      Reviewer #3 (Public review):

      The authors present an important approach to identify imported P. falciparum malaria cases, combining genetic and epidemiological/travel data. This tool has the potential to be expanded to other contexts. The data was analyzed using convincing methods, including a novel statistical model; although some recognized limitations can be improved. This study will be of interest to researchers in public health and infectious diseases.

      Strengths:

      The study has several strengths, mainly the development of a novel Bayesian model that integrates genomic, epidemiological, and travel data to estimate importation probabilities. The results showed insights into malaria transmission dynamics, particularly identifying importation sources and differences in importation rates in Mozambique. Finally, the relevance of the findings is to suggest interventions focusing on the traveler population to help efforts for malaria elimination.

      Weaknesses:

      The study also has some limitations. The sample collection was not representative of some provinces, and not all samples had sufficient metadata for risk factor analysis, which can also be affected by travel recall bias. Additionally, the authors used a proxy for transmission intensity and assumed some conditions for the genetic variable when calculating the importation probability for specific scenarios. The weaknesses were assessed by the authors.

      We acknowledge the limitations commented by the reviewer. We have the following plans to address the limitations. We will repeat the study for our data collected in 2023, which this time contains a good representation of all the provinces of Mozambique, and completeness of the metadata collection was ensured by implementing a new protocol in January 2023. Regarding the proxy for transmission intensity, we will refine the model by integrating monthly estimates of malaria incidence (previously calibrated to address testing and reporting rates) from the DHIS2 data, taking also into account the date of the reported cases in the analysis.

    1. Author Response

      We are grateful to the editors for considering our manuscript and facilitating the peer review process. Importantly, we would like to express our gratitude to reviewers for their constructive comments. Given eLife’s publishing format, we provide an initial author response now, which will be followed by a revised manuscript in the near future. Please find our responses below.

      eLife Assessment

      This study presents a valuable insight into a computational mechanism of pain perception. The evidence supporting the authors’ claims is solid, although the inclusion of 1) more diverse candidate computational models, 2) more systematic analysis of the temporal regularity effects on the model fit, and 3) tests on clinical samples would have strengthened the study. The work will be of interest to pain researchers working on computational models and cognitive mechanisms of pain in a Bayesian framework.

      Thank you very much again for considering the manuscript and judging it as a valuable contribution to understanding mechanisms of pain perception. We recognise the above-mentioned points of improvement and elaborate on them in the initial response to the reviewers.

      Reviewer 1

      Reviewer Comment 1.1 — Selection of candidate computational models: While the paper juxtaposes the simple model-free RL model against a Kalman Filter model in the context of pain perception, the rationale behind this choice remains ambiguous. It prompts the question: could other RL-based models, such as model-based RL or hierarchical RL, offer additional insights? A more detailed explanation of their computational model selection would provide greater clarity and depth to the study.

      Thank you for this point. Our models were selected a-priori, following the modelling strategy from Jepma et al. (2018) and hence considered the same set of core models for clear extension of the analysis to our non-cue paradigm. The key question for us was whether expectations were used to weight the behavioural estimates, so our main interest was to compare expectation vs non-expectation weighted models.

      Model-based and hierarchical RL are very broad terms that can be used to refer to many different models, and we are not clear about which specific models the reviewer is referring to. Our Bayesian models are generative models, i.e. they learn the generative statistics of the environment (which is characterised by inherent stochasticity and volatility) and hence operate model-based analyses of the stimulus dynamics. In our case, this happened hierarchically and it was combined with a simple RL rule.

      Reviewer Comment 1.2 — Effects of varying levels of volatility and stochasticity: The study commendably integrates varying levels of volatility and stochasticity into its experimental design. However, the depth of analysis concerning the effects of these variables on model fit appears shallow. A looming concern is whether the superior performance of the expectation-weighted Kalman Filter model might be a natural outcome of the experimental design. While the non-significant difference between eKF and eRL for the high stochasticity condition somewhat alleviates this concern, it raises another query: Would a more granular analysis of volatility and stochasticity effects reveal fine-grained model fit patterns?

      We are sorry that the reviewer finds shallow ”the depth of analysis concerning the effects of these variables on model fit”. We are not sure which analysis the reviewer has in mind when suggesting a ”more granular analysis of volatility and stochasticity effects” to ”reveal fine-grained model fit patterns”. Therefore, we find it difficult to improve our manuscript in this regard. We are happy to add analyses to our paper but we would be greatful for some specific pointers. We have already provided:

      • Analysis of model-naive performance across different levels of stochasticity and volatility (section 2.3, figure 3, supplementary information section 1.1 and tables S1-2)

      • Model fitting for each stochasticity/volatility condition (section 2.4.1, figure 4, supplementary table S5)

      • Group-level and individual-level differences of each model parameter across stochasticity/volatility conditions (supplementary information section 7, figures S4-S5).

      • Effect of confidence on scaling factor for each stochasticity/volatility condition (figure 5)

      Reviewer Comment 1.3 — Rating instruction: According to Fig. 1A, participants were prompted to rate their responses to the question, ”How much pain DID you just feel?” and to specify their confidence level regarding their pain. It is difficult for me to understand the meaning of confidence in this context, given that they were asked to report their subjective feelings. It might have been better to query participants about perceived stimulus intensity levels. This per- spective is seemingly echoed in lines 100-101, ”the primary aim of the experiment was to determine whether the expectations participants hold about the sequence inform their perceptual beliefs about the intensity of the stimuli.”

      Thank you for raising this question, which allows us to clarify our paradigm. On half of the trials, participants were asked to report the perceived intensity of the previous stimulus; on the remaining trials, participants were requested to predict the intensity of the next stimulus. Therefore, we did query ”participants about perceived stimulus intensity levels”, as described at lines 49-55, 296-303, and depicted in figure 1.

      The confidence refers to the level of confidence that participants have regarding their rating - how sure they are. This is done in addition to their perceived stimulus intensity and it has been used in a large body of previous studies in any sensory modality.

      Reviewer Comment 1.4 — Relevance to clinical pain: While the authors underscore the rele- vance of their findings to chronic pain, they did not include data pertaining to clinical pain. Notably, their initial preprint seemed to encompass data from a clinical sample (https://www.medrxiv.org /content/10.1101/2023.03.23.23287656v1), which, for reasons unexplained, has been omitted in the current version. Clarification on this discrepancy would be instrumental in discerning the true relevance of the study’s findings to clinical pain scenarios.

      The preprint that the Reviewer is referring to was an older version of the manuscript in which we combined two different experiments, which were initially born as separate studies: the one that we submitted to eLife (done in the lab, with noxious stimuli in healthy participants) and an online study with a different statistical learning paradigm (without noxious stimuli, in chronic back pain participants). Unfortunately, the paradigms were different and not directly comparable. Indeed, following submission to a different journal, the manuscript was criticised for this reason. We therefore split the paper in two, and submitted the first study to eLife. We are now planning to perform the same lab-based experiment with noxious stimuli on chronic back pain participants. Progress on this front has been slowed down by the fact that I (Flavia Mancini) am on maternity leave, but it remains top priority once back to work.

      Reviewer Comment 1.5 — Paper organization: The paper’s organization appears a little bit weird, possibly due to the removal of significant content from their initial preprint. Sections 2.1- 2.2 and 2.4 seem more suitable for the Methods section, while 2.3 and 2.4.1 are the only parts that present results. In addition, enhancing clarity through graphical diagrams, especially for the experimental design and computational models, would be quite beneficial. A reference point could be Fig. 1 and Fig. 5 from Jepma et al. (2018), which similarly explored RL and KF models.

      Thank you for these suggestions. We will consider restructuring the paper in the revised version.

      Reviewer 2

      Reviewer Comment 2.1 — This is a highly interesting and novel finding with potential impli- cations for the understanding and treatment of chronic pain where pain regulation is deficient. The paradigm is clear, the analysis is state-of-the-art, the results are convincing, and the interpretation is adequate.

      Thank you very much for these positive comments.

      Reviewer 3

      We are really grateful for reviewer’s insightful comments and for providing useful guidance regarding our methodology. We are also thankful for highlighting the strengths of our manuscript. Below we respond to individual weakness mentioned in the reviews report.

      Reviewer Comment 3.1 — In Figure 1, panel C, the authors illustrate the stimulation intensity, perceived intensity, and prediction intensity on the same scale, facilitating a more direct comparison. It appears that the stimulation intensity has been mathematically transformed to fit a scale from 0 to 100, aligning it with the intensity ratings corresponding to either past or future stimuli. Given that the pain threshold is specifically marked at 50 on this scale, one could logically infer that all ratings falling below this value should be deemed non-painful. However, I find myself uncertain about this interpretation, especially in relation to the term ”arbitrary units” used in the figure. I would greatly appreciate clarification on how to accurately interpret these units, as well as an explanation of the relationship between these values and the definition of pain threshold in this experiment.

      Indeed, as detailed in the Methods section 4.3, the stimulation intensity was originally trans- formed from the 1-13 scale to 0-100 scale to match the scales in the participant response screens. Following the method used to establish the pain threshold, we set the stimulus intensity of 7 as the threshold on the original 1-13 scale. However, during the rating part of the experiment, several of the participants never or very rarely selected a value above 50 (their individually defined pain threshold), despite previously indicating a moment during pain threshold procedure when a stimulus becomes painful. This then results in the re-scaled intensity values as well the perception rating, both on the same 0-100 scale of arbitrary units, to never go above the pain threshold. Please see all participant ratings and inputs in the Figure below. We see that it would be more illustrative to re-plot Figure 1 with a different exemplary participant, whose ratings go above the pain threshold, perhaps with an input intensity on the 1-13 scale on the additional right-hand-side y-axis. We will add this in the revised version as well as highlight the fact above.

      Importantly, while values below 50 are deemed non-painful by participants, the thermal stimulation still activates C-fibres involved in nociception, and we would argue that the modelling framework and analysis still applies in this case.

      Reviewer Comment 3.2 — The method of generating fluctuations in stimulation temperatures, along with the handling of perceptual uncertainty in modelling, requires further elucidation. The current models appear to presume that participants perceive each stimulus accurately, introducing noise only at the response stage. This assumption may fail to capture the inherent uncertainty in the perception of each stimulus intensity, especially when differences in consecutive temperatures are as minimal as 1°C.

      We agree with the reviewer that there are multiple sources of uncertainty involved in the process of rating the intensity of thermal stimuli - including the perception uncertainty. In order to include an account of inaccurate perception, one would have to consider different sources that contribute to this, which there may be many. In our approach, we consider one, which is captured in the expectation weighted model, more clearly exemplified in the expectation-weighted Kalman-Filter model (eKF). The model assumes participants perception of input as an imperfect indicator of the true level of pain. In this case, it turns out that perception is corrupted as a result of the expectation participants hold about the upcoming stimuli. The extent of this effect is partly governed by a subjective level of noise ϵ, which may also subsume other sources of uncertainty beyond the expectation effect. Moreover, the response noise ξ, could also subsume any other unexplained sources of noise.

      Author response image 1.

      Stimulis intensity transformation

      Reviewer Comment 3.3 — A key conclusion drawn is that eKF is a better model than eRL. However, a closer examination of the results reveals that the two models behave very similarly, and it is not clear that they can be readily distinguished based on model recovery and model comparison results.

      While, the eKF appears to rank higher than the eRL in terms of LOOIC and sigma effects, we don’t wish to make make sweeping statements regarding significance of differences between eRL and eKF, but merely point to the trend in the data. We shall make this clearer in the revised version of the manuscript. However, the most important result is that the models involving expectation-weighing are arguably better capturing the data.

      Reviewer Comment 3.4 — Regarding model recovery, the distinction between the eKF and eRL models seems blurred. When the simulation is based on the eKF, there is no ability to distinguish whether either eKF or eRL is better. When the simulation is based on the eRL, the eRL appears to be the best model, but the difference with eKF is small. This raises a few more questions. What is the range of the parameters used for the simulations?

      We agree that the distinction between eKF and eRL in the model recovery is not that clean-cut, which may in turn point to the similarity between the two models. To simulate the data for the model and parameter recovery analysis, we used the group means and variances estimated on the participant data to sample individual parameter values.

      Reviewer Comment 3.5 — Is it possible that either eRL or eKF are best when different parameters are simulated? Additionally, increasing the number of simulations to at least 100 could provide more convincing model recovery results.

      It could be a possibility, but would require further investigation and comparison of fits for different bins/ranges of parameters to see if there is any consistent advantage of one model over another is each bin. We will consider adding this analysis, and provide an additional 50 simulations to paint a more convincing picture.

      Reviewer Comment 3.6 — Regarding model comparison, the authors reported that ”the expectation-weighted KF model offered a better fit than the eRL, although in conditions of high stochasticity, this difference was short of significance against the eRL model.” This interpretation is based on a significance test that hinges on the ratio between the ELPD and the surrounding standard error (SE). Unfortunately, there’s no agreed-upon threshold of SEs that determines sig- nificance, but a general guideline is to consider ”several SEs,” with a higher number typically viewed as more robust. However, the text lacks clarity regarding the specific number of SEs applied in this test. At a cursory glance, it appears that the authors may have employed 2 SEs in their interpretation, while only depicting 1 SE in Figure 4.

      Indeed, we considered 2 sigma effect as a threshold, however we recognise that there is no agreed-upon threshold, and shall make this and our interpretation clearer regarding the trend in the data, in the revision.

      Reviewer Comment 3.7 — With respect to parameter recovery, a few additional details could be included for completeness. Specifically, while the range of the learning rate is understandably confined between 0 and 1, the range of other simulated parameters, particularly those without clear boundaries, remains ambiguous. Including scatter plots with the simulated parameters on the x- axis and the recovered parameters on the y-axis would effectively convey this missing information. Furthermore, it would be beneficial for the authors to clarify whether the same priors were used for both the modelling results presented in the main paper and the parameter recovery presented in the supplementary material.

      Thank for this comment and for the suggestions. To simulate the data for the model and parameter recovery analysis, we used the group means and variances estimated on the participant data to sample individual parameter values. The priors on the group and individual-level parameters in the recovery analysis where the same as in the fitting procedure. We will include the requested scatter plots in the next iteration of the manuscript.

      Reviewer Comment 3.8 — While the reliance on R-hat values for convergence in model fitting is standard, a more comprehensive assessment could include estimates of the effective sample size (bulk ESS and/or tail ESS) and the Estimated Bayesian Fraction of Missing Information (EBFMI), to show efficient sampling across the distribution. Consideration of divergences, if any, would further enhance the reliability of the results.

      Thank you very much for this suggestion, we will aim to include these measures in the revised version.

      Reviewer Comment 3.9 — The authors write: ”Going beyond conditioning paradigms based in cuing of pain outcomes, our findings offer a more accurate description of endogenous pain regula- tion.” Unfortunately, this statement isn’t substantiated by the results. The authors did not engage in a direct comparison between conditioning and sequence-based paradigms. Moreover, even if such a comparison had been made, it remains unclear what would constitute the gold standard for quantifying ”endogenous pain regulation.”

      This is valid point, indeed we do not compare paradigms in our study, and will remove this statement in the future version.

    1. Author response:

      Reviewer #1 (Public Review):  

      Weaknesses:  

      The weakness of this study lies in the fact that many of the genomic datasets originated from novel methods that were not validated with orthogonal approaches, such as DNA-FISH. Therefore, the detailed correlations described in this work are based on methodologies whose efficacy is not clearly established. Specifically, the authors utilized two modified protocols of TSA-seq for the detection of NADs (MKI67IP TSA-seq) and LADs (LMNB1-TSA-seq). Although these methods have been described in a bioRxiv manuscript by Kumar et al., they have not yet been published. Moreover, and surprisingly, Kumar et al., work is not cited in the current manuscript, despite its use of all TSA-seq data for NADs and LADs across the four cell lines. Moreover, Kumar et al. did not provide any DNA-FISH validation for their methods. Therefore, the interesting correlations described in this work are not based on robust technologies.    

      An attempt to validate the data was made for SON-TSA-seq of human foreskin fibroblasts (HFF) using multiplexed FISH data from IMR90 fibroblasts (from the lung) by the Zhuang lab (Su et al., 2020). However, the comparability of these datasets is questionable. It might have been more reasonable for the authors to conduct their analyses in IMR90 cells, thereby allowing them to utilize MERFISH data for validating the TSA-seq method and also for mapping NADs and LADs. 

      We disagree with the statement that the TSA-seq approach and data has not been validated by orthogonal approaches and with the conclusion that the TSA-seq approach is not robust as summarized here and detailed below in “Specific Comments”.  TSA-seq is robust because it is based only on the original immunostaining specificity provided by the primary and secondary antibodies plus the diffusion properties of the tyramide-free radical. TSA-seq has been extensively validated by microscopy and by the orthogonal genomic measurements provided by LMNB1 DamID and NAD-seq.  This includes: a) the initial validation by FISH of both nuclear speckle (to an accuracy of ~50 nm) and nuclear lamina TSA-seq  and the cross-validation of nuclear lamina TSA-seq with lamin B1 DamID in a first publication (Chen et al, JCB 2018, doi: 10.1083/jcb.201807108); b) the further validation of SON TSA-seq by FISH in a second publication ((Zhang et al, Genome Research 2021, doi:10.1101/gr.266239.120); c) the cross-validation of nucleolar TSA-seq using NAD-seq and the validation by light microscopy of the predictions of differences in the relative distributions of centromeres, nuclear speckles, and nucleoli made from nuclear speckle, nucleolar, and pericentric heterochromatin TSA-seq in the Kumar et al, bioRxiv preprint (which is in a last revision stage involving additional formatting for the journal requirements) doi:https://doi.org/10.1101/2023.10.29.564613; d) the extensive validation of nuclear speckle, LMNB1, and nucleolar TSA-seq generated in HFF human fibroblasts using published light microscopy distance measurements of hundreds of probes generated by multiplexed immuno-FISH MERFISH data (Su et al, Cell 2020, https://doi.org/10.1016/j.cell.2020.07.032), as we described for nucleolar TSA-seq in the Kumar et al, bioRxiv preprint and to some extent for LMNB1 and SON TSA-seq in the current manuscript version (see Specific Comments with attached Author response image 2).

      Reviewer 1 raised concerns regarding this FISH validation given that the HFF TSA-seq and DamID data was compared to IMR90 MERFISH measurements.  The Su et al, Cell 2020 MERFISH paper came out well after the 4D Nucleome Consortium settled on HFF as one of the two main “Tier 1” cell lines.  We reasoned that the nuclear genome organization in a second fibroblast cell line would be sufficiently similar to justify using IMR90 FISH data as a proxy for our analysis of our HFF data. Indeed, there is a high correlation between the HFF TSA-seq and distances measured by MERFISH to nuclear lamina, nucleoli, and nuclear speckles (Author response image 1).  Comparing HFF SON-TSA-seq data with published IMR90 SON TSA-seq data (Alexander et al, Mol Cell 2021, doi.org/10.1016/j.molcel.2021.03.006), the HFF SON TSA-seq versus MERFISH scatterplot is very similar to the IMR90 SON TSA-seq versus MERFISH scatterplot.  We acknowledge the validation provided by the IMR90 MERFISH is limited by the degree to which genome organization relative to nuclear locales is similar in IMR90 and HFF fibroblasts. However, the correlation between measured microscopic distances from nuclear lamina, nucleoli, and nuclear speckles and TSA-seq scores is already quite high. We anticipate the conclusions drawn from such comparisons are solid and will only become that much stronger with future comparisons within the same cell line.

      Author response image 1.

      Scatterplots showing the correlation between TSA-seq and MERFISH microscopic distances. Top: IMR90 SON TSA-seq (from Alexander et al, Mol Cell 2021) (left) and HFF SON TSA-seq (right) (x-axis) versus distance to nuclear speckles (y-axis). Bottom: HFF Lamin B1 TSA-seq (x-axis) versus distance to nuclear lamina (y-axis) (left) and HFF MKI67IP (nucleolar) TSA-seq (x-axis) versus distance to nucleolus (y-axis) (right).

      In our revision, we will add justification of the use of IMR90 fibroblasts as a proxy for HFF fibroblasts through comparison of available data sets. 

      Reviewer #2 (Public Review):  

      Weaknesses:  

      The experiments are largely descriptive, and it is difficult to draw many cause-and-effect relationships. Similarly, the paper would be very much strengthened if the authors provided additional summary statements and interpretation of their results (especially for those not as familiar with 3D genome organization). The study would benefit from a clear and specific hypothesis.

      We acknowledge that this study was hypothesis-generating rather than hypothesis-testing in its goal. This research was funded through the NIH 4D-Nucleome Consortium, which had as its initial goal the development, benchmarking, and validation of new genomic technologies.  Our Center focused on the mapping of the genome relative to different nuclear locales and the correlation of this intranuclear positioning of the genome with functions- specifically gene expression and DNA replication timing. By its very nature, this project has taken a discovery-driven versus hypothesis-driven scientific approach.  Our question fundamentally was whether we could gain new insights into nuclear genome organization through the integration of genomic and microscopic measurements of chromosome positioning relative to multiple different nuclear compartments/bodies and their correlation with functional assays such as RNA-seq and Repli-seq.

      Indeed, as described in this manuscript, this study resulted in multiple new insights into nuclear genome organization as summarized in our last main figure.  We believe our work and conclusions will be of general interest to scientists working in the fields of 3D genome organization and nuclear cell biology.  We anticipate that each of these new insights will prompt future hypothesis-driven science focused on specific questions and the testing of cause-and-effect relationships. 

      Given the extensive scope of this manuscript, we were limited in the extent that we could describe and summarize the background, data, analysis, and significance for every new insight. In our editing to reach the eLife recommended word count, we removed some of the explanations and summaries that we had originally included. 

      As suggested by Reviewer 2, in our revision we will add back additional summary and interpretation statements to help readers unfamiliar with 3D genome organization.

      Specific Comments in response to Reviewer 1:

      (1)  We disagree with the comment that TSA-seq has not been cross-validated by other orthogonal genomic methods.  In the first TSA-seq paper (Chen et al, JCB 2018, doi: 10.1083/jcb.201807108), we showed a good correlation between the identification of iLADs and LADs by nuclear lamin and nuclear speckle TSA-seq and the orthogonal genomic method of lamin B1 DamID, which is reproduced using our new TSA-seq 2.0 protocol in this manuscript.  Similarly, in the Kumar et al, bioRxiv preprint (doi:https://doi.org/10.1101/2023.10.29.564613), we showed a general agreement between the identification of NADs by nucleolar TSA-seq and the orthogonal genomic method of NAD-seq.  (We expect this preprint to be in press soon; it is now undergoing a last revision involving only reformatting for journal requirements.) Additionally, we also showed a high correlation between Hi-C compartments and subcompartments and TSA-seq in the Chen et al, JCB 2018 paper. Specifically, there is an excellent correlation between the A1 Hi-C subcompartment and Speckle Associated Domains as detected by nuclear speckle TSA-seq.  Additionally, the A2 Hi-C subcompartment correlated well with iLAD regions with intermediate nuclear speckle TSA-seq scores, and the B2 and B3 Hi-C subcompartments with LADs detected by both LMNB TSA-seq and LMNB1 DamID.  More generally, Hi-C A and B compartment identity correlated well with predictions of iLADs versus LADs from nuclear speckle and nuclear lamina TSA-seq.

      (2)  In the Chen et al, JCB 2018 paper we also qualitatively and quantitatively validated TSA-seq using FISH.  Qualitatively, we showed that both nuclear speckle and nuclear lamin TSA-seq correlated well with distances to nuclear speckles versus the nuclear lamina, respectively, measured by immuno-FISH.

      Quantitatively, we showed that SON TSA-seq could be used to estimate the microscopic mean distance to nuclear speckles with mean and median residuals of ~50 nm.  First, we used light microscopy to show that the spreading of tyramide-biotin signal from a point-source of TSA staining fits well with the exponential decay predicted theoretically by reaction-diffusion equations assuming a steady rate of tyramide-biotin free radical generation by the HRP enzyme and a constant probability throughout the nucleus of free-radical quenching (through reaction with protein tyrosine residues and nucleic acids).  Second, we used the exponential decay constant measured by light microscopy together with FISH measurements of mean speckle distance for several genomic regions to fit an exponential function and to predict distance to nuclear speckles genome-wide directly from SON TSA-seq sequencing reads.  Third, we used this approach to test the predictions against a new set of FISH measurements, demonstrating an accuracy of these predictions of ~50 nm.

      (3)  The importance of the quantitative validation by immuno-FISH of using TSA-seq to estimate mean distance to nuclear speckles is that it demonstrates the robustness of the TSA-seq approach.  Specifically, it shows how the TSA-seq signal is predicted to depend only on the specificity of the primary and secondary antibody staining and the diffusion properties of the tyramide-biotin free radicals produced by the HRP peroxidase.  This is fundamentally different from the significant dependence on antibodies and choice of marker proteins for molecular proximity assays such as DamID, ChIP-seq, and Cut and Run/Tag which depend on molecular proximity for labeling and/or pulldown of DNA.

      This robustness leads to specific predictions.  First, it predicts similar TSA-seq signals will be produced using antibodies against different marker proteins against the same nuclear compartment.  This is because the exponential decay constant (distance at which the signal drops by one half) for the spreading of the TSA is in the range of several hundred nm, as measured by light microscopy for several TSA staining conditions.  Indeed, we showed in the Chen et al, JCB 2018 paper that antibodies against two different nuclear speckle proteins produced very similar TSA-seq signals while antibodies against LMNB versus LMNA also produced very similar TSA-seq signals.  Similarly, we showed in the Kumar et al preprint that antibodies against four different nucleolar proteins showed similar TSA-seq signals, with the highest correlation coefficients for the TSA-seq signals produced by the antibodies against two GC nucleolar marker proteins and the TSA-seq signals produced by the antibodies against two FC/DFC nucleolar marker proteins.

      Author response image 2.

      Comparison of TSA-seq data from different cell lines versus IMR90 MERFISH.  The observed correlation between SON (nuclear speckle) TSA-seq versus MERFISH is nearly as high for TSA-seq data from HFF as it is for TSA-seq data from the IMR90 cell line (Alexander et al, Mol Cell 2021) in which the MERFISH was performed. The correlations for SON, LMNB1 (nuclear lamina) and MKI67IP (nucleolus) versus MERFISH are highest for HFF TSA-seq data as compared to TSA-seq data from other cell lines (H1, K562, HCT116).  Comparison of measured distances to nuclear locale (y-axis) versus TSA-seq scores (x-axis) from different cell lines labeled in red. Left to right: SON, LMNB1, and MKI67IP.  Top to bottom: SON TSA-seq versus MERFISH for two TSA-seq replicates; TSA-seq from HFF, H1, K562, and HCT116 versus MERFISH.

      Second, it predicts that the quantitative relationship between TSA-seq signal and mean distance from a nuclear compartment will depend on the convolution of the predicted exponential decay of spreading of the TSA signal produced by a point source with the more complicated staining distribution of nuclear compartments such as the nuclear lamina or nucleoli.  We successfully used this concept to explain the differences emerging between LMNB1 DamID and TSA-seq signals for flat nuclei and to recognize the polarized distribution of different LADs over the nuclear periphery.

      (4)  After our genomic data production and during our data analysis, a valuable resource from the Zhuang lab was published, using MERFISH to visualize hundreds of genomic loci in IMR90 cells. We acknowledge that the much more extensive validation of TSA-seq by the multiplexed immuno-FISH MERFISH data is dependent on the degree to which the nuclear genome organization is similar between IMR90 and HFF fibroblasts.  However, the correlation between distances to nuclear speckles, nucleoli, and the nuclear lamina measured in IMR90 fibroblasts and the nuclear speckle, nucleolar, and nuclear lamina TSA-seq measured in HFF fibroblasts is already striking (See Author response image 1).  With regard to SON TSA-seq, the MERFISH versus HFF TSA-seq correlation is close to what we observe using published IMR90 SON TSA-seq data (correlation coefficients of 0.89 (IMR90 TSA-seq) versus 0.86 (HFF TSA-seq).  Moreover, this correlation is highest using TSA-seq data from HFF cells as compared to the three other cell lines. (see Author response image 2).  We believe these correlations can be considered a lower bound on the actual correlations between the FISH distances and TSA-seq that we would have observed if we had performed both assays on the same cell line. 

      (5)  Currently, we still require tens of millions of cells to perform each TSA-seq assay.  This requires significant expansion of cells and a resulting increase in passage numbers of the IMR90 cells before we can perform the TSA-seq. During this expansion we observe a noticeable slowing of the IMR90 cell growth as expected for secondary cell lines as we approach the Hayflick limit.  We still do not know to what degree nuclear organization relative to nuclear locales may change as a function of cell cycle composition (ie percentage of cycling versus quiescent cells) and cell age.  Thus, even if we performed TSA-seq on IMR90 cells we would be comparing MERFISH from lower passages with a higher percentage of actively proliferating cells with TSA-seq from higher passages with a higher percentage of quiescent cells. 

      We are currently working on a new TSA-seq protocol that will work with thousands of cells.  We believe it is better investment of time and resources to wait until this new protocol is optimized before we repeat TSA-seq in IMR90 cells for a better comparison with multiplexed FISH data. 

      Specific Comments in response to Reviewer 2:

      (1)  As we acknowledge in our Response summary, we were limited in the degree to which we could actually follow-up our findings with experiments designed to test specific hypotheses generated by our data.  However, we do want to point out that our comparison of wild-type K562 cells with the LMNA/LBR double knockout was designed to test the long-standing model that nuclear lamina association of genomic loci contributes to gene silencing.  This experiment was motivated by our surprising result that gene expression differences between cell lines correlated strongly with differences in positioning relative to nuclear speckles rather than the nuclear lamina.  Despite documenting in these double knockout cells a decreased nuclear lamina association of most LADs, and an increased nuclear lamina association of the “p-w-v” fiLADs identified in this manuscript, we saw no significant change in gene expression in any of these regions as compared to wild-type K562 cells.  Meanwhile, distances to nuclear speckles as measured by TSA-seq remained nearly constant.

      We would argue that this represents a specific example in which new insights generated by our genomics comparison of cell lines led to a clear and specific hypothesis and the experimental testing of this hypothesis.

      In response to Reviewer 2, we are modifying the text to make this clearer and to explicitly describe how we were testing the hypothesis that distance to nuclear lamina is correlated with but not causally linked to gene expression and how to test this hypothesis we used a DKO of LMNA and LBR to change distances relative to the nuclear lamina and to test the effect on gene expression.

    1. Author response:

      We thank the reviewers for their thorough reading and thoughtful feedback. Below, we provisionally address each of the concerns raised in the public reviews, and outline our planned revision that aims to further clarify and strengthen the manuscript.

      In our response, we clarify our conceptualization of elasticity as a dimension of controllability, formalizing it within an information-theoretic framework, and demonstrating that controllability and its elasticity are partially dissociable. Furthermore, we provide clarifications and additional modeling results showing that our experimental design and modeling approach are well-suited to dissociating elasticity inference from more general learning processes, and are not inherently biased to find overestimates of elasticity. Finally, we clarify the advantages and disadvantages of our canonical correlation analysis (CCA) approach for identifying latent relationships between multidimensional data sets, and provide additional analyses that strengthen the link between elasticity estimation biases and a specific psychopathology profile.

      Reviewer 1:

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment, and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform the understanding of control across domains, which is a topic of great importance.

      We thank the reviewer for their favorable appraisal and valuable suggestions, which have helped clarify and strengthen the study’s conclusion. 

      An overarching concern is that this paper is framed as addressing resource investments across domains that include time, money, and effort, and the introductory examples focus heavily on effort-based resources (e.g., exercising, studying, practicing). The experiments, though, focus entirely on the equivalent of monetary resources - participants make discrete actions based on the number of points they want to use on a given turn. While the same ideas might generalize to decisions about other kinds of resources (e.g., if participants were having to invest the effort to reach a goal), this seems like the kind of speculation that would be better reserved for the Discussion section rather than using effort investment as a means of introducing a new concept (elasticity of control) that the paper will go on to test.

      We thank the reviewer for pointing out a lack of clarity regarding the kinds of resources tested in the present experiment. Investing additional resources in the form of extra tickets did not only require participants to pay more money. It also required them to invest additional time – since each additional ticket meant making another attempt to board the vehicle, extending the duration of the trial, and attentional effort – since every attempt required precisely timing a spacebar press as the vehicle crossed the screen. Given this involvement of money, time, and effort resources, we believe it would be imprecise to present the study as concerning monetary resources in particular. That said, we agree with the Reviewer that results might differ depending on the resource type that the experiment or the participant considers most. Thus, in our revision of the manuscript, we will make sure to clarify the kinds of resources the experiment involved, and highlight the open question of whether inferences concerning the elasticity of control generalize across different resource domains.

      Setting aside the framing of the core concepts, my understanding of the task is that it effectively captures people's estimates of the likelihood of achieving their goal (Pr(success)) conditional on a given investment of resources. The ground truth across the different environments varies such that this function is sometimes flat (low controllability), sometimes increases linearly (elastic controllability), and sometimes increases as a step function (inelastic controllability). If this is accurate, then it raises two questions.

      First, on the modeling front, I wonder if a suitable alternative to the current model would be to assume that the participants are simply considering different continuous functions like these and, within a Bayesian framework, evaluating the probabilistic evidence for each function based on each trial's outcome. This would give participants an estimate of the marginal increase in Pr(success) for each ticket, and they could then weigh the expected value of that ticket choice (Pr(success)*150 points) against the marginal increase in point cost for each ticket. This should yield similar predictions for optimal performance (e.g., opt-out for lower controllability environments, i.e., flatter functions), and the continuous nature of this form of function approximation also has the benefit of enabling tests of generalization to predict changes in behavior if there was, for instance, changes in available tickets for purchase (e.g., up to 4 or 5) or changes in ticket prices. Such a model would of course also maintain a critical role for priors based on one's experience within the task as well as over longer timescales, and could be meaningfully interpreted as such (e.g., priors related to the likelihood of success/failure and whether one's actions influence these). It could also potentially reduce the complexity of the model by replacing controllability-specific parameters with multiple candidate functions (presumably learned through past experience, and/or tuned by experience in this task environment), each of which is being updated simultaneously.

      Second, if the reframing above is apt (regardless of the best model for implementing it), it seems like the taxonomy being offered by the authors risks a form of "jangle fallacy," in particular by positing distinct constructs (controllability and elasticity) for processes that ultimately comprise aspects of the same process (estimation of the relationship between investment and outcome likelihood). Which of these two frames is used doesn't bear on the rigor of the approach or the strength of the findings, but it does bear on how readers will digest and draw inferences from this work. It is ultimately up to the authors which of these they choose to favor, but I think the paper would benefit from some discussion of a common-process alternative, at least to prevent too strong of inferences about separate processes/modes that may not exist. I personally think the approach and findings in this paper would also be easier to digest under a common-construct approach rather than forcing new terminology but, again, I defer to the authors on this.

      We thank the reviewer for suggesting this interesting alternative modeling approach. We agree that a Bayesian framework evaluating different continuous functions could offer advantages, particularly in its ability to generalize to other ticket quantities and prices. We will attempt to implement this as an alternative model and compare it with the current model.  

      We also acknowledge the importance of avoiding a potential "jangle fallacy". We entirely agree with the Reviewer that elasticity and controllability inferences are not distinct processes. Specifically, we view resource elasticity as a dimension of controllability, hence the name of our ‘elastic controllability’ model. In response to this and other Reviewers’ comments, we now offer a formal definition of elasticity as the reduction in uncertainty about controllability due to knowing the amount of resources the agent is able and willing to invest (see further details in response to Reviewer 3 below).  

      With respect to how this conceptualization is expressed in the modelling, we note that the representation in our model of maximum controllability and its elasticity via different variables is analogous to how a distribution may be represented by separate mean and variance parameters. Ultimately, even in the model suggested by the Reviewer, there would need to be a dedicated variable representing elasticity, such as the probability of sloped controllability functions. A single-process account thus allows that different aspects of this process would be differently biased (e.g., one can have an accurate estimate of the mean of a distribution but overestimate its variance). Therefore, our characterization of distinct elasticity and controllability biases (or to put it more accurately, ‘elasticity of controllability bias’ and ‘maximum controllability bias’) is consistent with a common construct account. 

      That said, given the Reviewer’s comments, we believe that some of the terminology we used may have been misleading. In our planned revision, we will modify the text to clarify that we view elasticity as a dimension of controllability that can only be estimated in conjunction with controllability. 

      Reviewer 2:

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output and yielded interesting results. Interestingly, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals some important findings about how people consider components of controllability.

      We appreciate the Reviewer's positive assessment of our findings and computational approach to dissociating elasticity and overall controllability.

      The primary weakness of this research is that it is not entirely clear what is meant by "elastic" and "inelastic" and how these constructs differ from existing considerations of various factors/calculations that contribute to perceptions of and decisions about controllability. I think this weakness is primarily an issue of framing, where it's not clear whether elasticity is, in fact, theoretically dissociable from controllability. Instead, it seems that the elements that make up "elasticity" are simply some of the many calculations that contribute to controllability. In other words, an "elastic" environment is inherently more controllable than an "inelastic" one, since both environments might have the same level of predictability, but in an "elastic" environment, one can also partake in additional actions to have additional control overachieving the goal (i.e., expend effort, money, time).

      We thank the reviewer for highlighting the lack of clarity in our concept of elasticity. We first clarify that elasticity cannot be entirely dissociated from controllability because it is a dimension of controllability. If no controllability is afforded, then there cannot be elasticity or inelasticity. This is why in describing the experimental environments, we only label high-controllability, but not low-controllability, environments as ‘elastic’ or ‘inelastic’. For further details on this conceptualization of elasticity, and a planned revision of the text, see our response above to Reviewer 1. 

      Second, we now clarify that controllability can also be computed without knowing the amount of resources the agent is able and willing to invest, for instance by assuming infinite resources available or a particular distribution of resource availabilities. However, knowing the agent’s available resources often reduces uncertainty concerning controllability. This reduction in uncertainty is what we define as elasticity. Since any action requires some resources, this means that no controllable environment is entirely inelastic if we also consider agents that do not have enough resources to commit any action. However, even in this case environments can differ in the degree to which they are elastic. For further details on this formal definition, see our response to Reviewer 3 below. We will make these necessary clarifications in the revised manuscript. 

      Importantly, whether an environment is more or less elastic does not determine whether it is more or less controllable. In particular, environments can be more controllable yet less elastic. This is true even if we allow that investing different levels of resources (i.e., purchasing 0, 1, 2, or 3 tickets) constitute different actions, in conjunction with participants’ vehicle choices. Below, we show this using two existing definitions of controllability. 

      Definition 1, reward-based controllability<sup>1</sup>: If control is defined as the fraction of available reward that is controllably achievable, and we assume all participants are in principle willing and able to invest 3 tickets, controllability can be computed in the present task as:

      where P(S' \= goal ∣ 𝑆, 𝐴, 𝐶 ) is the probability of reaching the treasure from present state 𝑆 when taking action A and investing C resources in executing the action. In any of the task environments, the probability of reaching the goal is maximized by purchasing 3 tickets (𝐶 = 3) and choosing the vehicle that leads to the goal (𝐴 = correct vehicle). Conversely, the probability of reaching the goal is minimized by purchasing 3 tickets (𝐶 = 3) and choosing the vehicle that does not lead to the goal (𝐴 = wrong vehicle). This calculation is thus entirely independent of elasticity, since it only considers what would be achieved by maximal resource investment, whereas elasticity consists of the reduction in controllability that would arise if the maximal available 𝐶 is reduced. Consequently, any environment where the maximum available control is higher yet varies less with resource investment would be more controllable and less elastic. 

      Note that if we also account for ticket costs in calculating reward, this will only reduce the fraction of achievable reward and thus the calculated control in elastic environments.   

      Definition 2, information-theoretic controllability<sup>2</sup>: Here controllability is defined as the reduction in outcome entropy due to knowing which action is taken:

      I(S'; A, C | S) = H(S'|S) - H(S'|S, A, C)

      where H(S'|S) is the conditional entropy of the distribution of outcomes S' given the present state 𝑆, and H(S'|S, A, C) is the conditional entropy of the outcome given the present state, action, and resource investment. 

      To compare controllability, we consider two environments with the same maximum control:

      • Inelastic environment: If the correct vehicle is chosen, there is a 100% chance of reaching the goal state with 1, 2, or 3 tickets. Thus, out of 7 possible action-resource investment combinations, three deterministically lead to the goal state (≥1 tickets and correct vehicle choice), three never lead to it (≥1 tickets and wrong vehicle choice), and one (0 tickets) leads to it 20% of the time (since walking leads to the treasure on 20% of trials).

      • Elastic Environment: If the correct vehicle is chosen, the probability of boarding it is 0% with 1 ticket, 50% with 2 tickets, and 100% with 3 tickets. Thus, out of 7 possible actionresource investment combinations, one deterministically leads to the goal state (3 tickets and correct vehicle choice), one never leads to it (3 tickets and wrong vehicle choice), one leads to it 60% of the time (2 tickets and correct vehicle choice: 50% boarding + 50% × 20% when failing to board), one leads to it 10% of time (2 ticket and wrong vehicle choice), and three lead to it 20% of time (0-1 tickets).

      Here we assume a uniform prior over actions, which renders the information-theoretic definition of controllability equal to another definition termed ‘instrumental divergence’3,4. We note that changing the uniform prior assumption would change the results for the two environments, but that would not change the general conclusion that there can be environments that are more controllable yet less elastic. 

      Step 1: Calculating H(S'|S)

      For the inelastic environment:

      P(goal) = (3 × 100% + 3 × 0% + 1 × 20%)/7 = .46, P(non-goal) = .54  H(S'|S) = – [.46 × log<sub>2</sub>(.46) + .54 × log<sub>2</sub>(.54)] \= 1 bit

      For the elastic environment:

      P(goal) \= (1 × 100% + 1 × 0% + 1 × 60% + 1 × 10% + 3 × 20%)/7 \= .33, P(non-goal) \= .67  H(S'|S) = – [.33 × log<sub>2</sub>(.33) + .67 × log<sub>2</sub>(.67)] \= .91 bits

      Step 2: Calculating H(S'|S, A, C)

      Inelastic environment: Six action-resource investment combinations have deterministic outcomes entailing zero entropy, whereas investing 0 tickets has a probabilistic outcome (20%). The entropy for 0 tickets is: H(S'|C \= 0) \= -[.2 × log<sub>2</sub>(.2) + 0.8 × log<sub>2</sub> (.8)] = .72 bits. Since this actionresource investment combination is chosen with probability 1/7, the total conditional entropy is approximately .10 bits

      Elastic environment: 2 actions have deterministic outcomes (3 tickets with correct/wrong vehicle), whereas the other 5 actions have probabilistic outcomes:

      2 tickets and correct vehicle (60% success): 

      H(S'|A = correct, C = 2) = – [.6 × log<sub>2</sub>(.6) + .4 × log<sub>2</sub>(.4)] \= .97 bits 2 tickets and wrong vehicle (10% success): 

      H(S'|A = wrong, C = 2) = – [.1 × <sub>2</sub>(.1) + .9 × <sub>2</sub>(.9)] \= .47 bits 0-1 tickets (20% success):

      H(S'|C = 0-1) = – [.2 × <sub>2</sub>(.2) + .8 × <sub>2</sub> .8)] \= .72 bits

      Thus the total conditional entropy of the elastic environment is: H(S'|S, A, C) = (1/7) × .97 + (1/7) × .47 + (3/7) × .72 \= .52 bits

      Step 3: Calculating I(S' | A, S)  

      Inelastic environment: I(S'; A, C | S) = H(S'|S) – H(S'|S, A, C) = 1 – 0.1 = .9 bits 

      Elastic environment: I(S'; A, C | S) = H(S'|S) – H(S'|S, A, C) = .91 – .52 = .39 bits

      Thus, the inelastic environment offers higher information-theoretic controllability (.9 bits) compared to the elastic environment (.39 bits). 

      Of note, even if each combination of cost and goal reaching is defined as a distinct outcome, then information-theoretic controllability is higher for the inelastic (2.81 bits) than for the elastic (2.30 bits) environment. 

      In sum, for both definitions of controllability, we see that environments can be more elastic yet less controllable. We will amend the manuscript to clarify this distinction between controllability and its elasticity.

      Reviewer 3:

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome is multi-dimensional. In particular, the authors propose that the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally propose that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea thus has the potential to change how we think about mental disorders in a substantial way, and could even help us better understand how healthy people navigate challenging decision-making problems.

      Unfortunately, my view is that neither the theoretical nor empirical aspects of the paper really deliver on that promise. In particular, most (perhaps all) of the interesting claims in the paper have weak empirical support.

      We appreciate the Reviewer's thoughtful engagement with our research and recognition of the potential significance of distinguishing between different dimensions of control in understanding psychopathology. We believe that all the Reviewer’s comments can be addressed with clarifications or additional analyses, as detailed below.  

      Starting with theory, the elasticity idea does not truly "extend" the standard control model in the way the authors suggest. The reason is that effort is simply one dimension of action. Thus, the proposed model ultimately grounds out in how strongly our outcomes depend on our actions (as in the standard model). Contrary to the authors' claims, the elasticity of control is still a fixed property of the environment. Consistent with this, the computational model proposed here is a learning model of this fixed environmental property. The idea is still valuable, however, because it identifies a key dimension of action (namely, effort) that is particularly relevant to the notion of perceived control. Expressing the elasticity idea in this way might support a more general theoretical formulation of the idea that could be applied in other contexts. See Huys & Dayan (2009), Zorowitz, Momennejad, & Daw (2018), and Gagne & Dayan (2022) for examples of generalizable formulations of perceived control.

      We thank the Reviewer for the suggestion that we formalize our concept of elasticity to resource investment, which we agree is a dimension of action. We first note that we have not argued against the claim that elasticity is a fixed property of the environment. We surmise the Reviewer might have misread our statement that “controllability is not a fixed property of the environment”. The latter statement is motivated by the observation that controllability is often higher for agents that can invest more resources (e.g., a richer person can buy more things). We will clarify this in our revision of the manuscript.

      To formalize elasticity, we build on Huys & Dayan’s definition of controllability(1) as the fraction of reward that is controllably achievable, 𝜒 (though using information-theoretic definitions(2,3) would work as well). To the extent that this fraction depends on the amount of resources the agent is able and willing to invest (max 𝐶), this formulation can be probabilistically computed without information about the particular agent involved, specifically, by assuming a certain distribution of agents with different amounts of available resources. This would result in a probability distribution over 𝜒. Elasticity can thus be defined as the amount of information obtained about controllability due to knowing the amount of resources available to the agent: I(𝜒; max 𝐶). We will add this formal definition to the manuscript.  

      Turning to experiment, the authors make two key claims: (1) people infer the elasticity of control, and (2) individual differences in how people make this inference are importantly related to psychopathology. Starting with claim 1, there are three sub-claims here; implicitly, the authors make all three. (1A) People's behavior is sensitive to differences in elasticity, (1B) people actually represent/track something like elasticity, and (1C) people do so naturally as they go about their daily lives. The results clearly support 1A. However, 1B and 1C are not supported. Starting with 1B, the experiment cannot support the claim that people represent or track elasticity because the effort is the only dimension over which participants can engage in any meaningful decision-making (the other dimension, selecting which destination to visit, simply amounts to selecting the location where you were just told the treasure lies). Thus, any adaptive behavior will necessarily come out in a sensitivity to how outcomes depend on effort. More concretely, any model that captures the fact that you are more likely to succeed in two attempts than one will produce the observed behavior. The null models do not make this basic assumption and thus do not provide a useful comparison.

      We appreciate the reviewer's critical analysis of our claims regarding elasticity inference, which as detailed below, has led to an important new analysis that strengthens the study’s conclusions. However, we respectfully disagree with two of the Reviewer’s arguments. First, resource investment was not the only meaningful decision dimension in our task, since participant also needed to choose the correct vehicle to get to the right destination. That this was not trivial is evidenced by our exclusion of over 8% of participants who made incorrect vehicle choices more than 10% of the time. Included participants also occasionally erred in this choice (mean error rate = 3%, range [0-10%]). 

      Second, the experimental task cannot be solved well by a model that simply tracks how outcomes depend on effort because 20% of the time participants reached the treasure despite failing to board their vehicle of choice. In such cases, reward outcomes and control were decoupled. Participants could identify when this was the case by observing the starting location, which was revealed together with the outcome (since depending on the starting location, the treasure location was automatically reached by walking). To determine whether participants distinguished between control-related and non-control-related reward, we have now fitted a variant of our model to the data that allows learning from each of these kinds of outcomes by means of a different free parameter. The results show that participants learned considerably more from control-related outcomes. They were thus not merely tracking outcomes, but specifically inferred when outcomes can be attributed to control. We will include this new analysis in the revised manuscript.

      Controllability inference by itself, however, still does not suffice to explain the observed behavior. This is shown by our ‘controllability’ model, which learns to invest more resources to improve control, yet still fails to capture key features of participants’ behavior, as detailed in the manuscript. This means that explaining participants’ behavior requires a model that not only infers controllability—beyond merely outcome probability—but also assumes a priori that increased effort could enhance control. Building these a priori assumption into the model amounts to embedding within it an understanding of elasticity – the idea that control over the environment may be increased by greater resource investment. 

      That being said, we acknowledge the value in considering alternative computational formulations of adaptation to elasticity. Thus, in our revision of the manuscript, we will add a discussion concerning possible alternative models.  

      For 1C, the claim that people infer elasticity outside of the experimental task cannot be supported because the authors explicitly tell people about the two notions of control as part of the training phase: "To reinforce participants' understanding of how elasticity and controllability were manifested in each planet, [participants] were informed of the planet type they had visited after every 15 trips." (line 384).

      We thank the reviewer for highlighting this point. We agree that our experimental design does not test whether people infer elasticity spontaneously. Our research question was whether people can distinguish between elastic and inelastic controllability. The results strongly support that they can, and this does have potential implications for behavior outside of the experimental task. Specifically, to the extent that people are aware that in some contexts additional resource investment improve control, whereas in other contexts it does not, then our results indicate that they would be able to distinguish between these two kinds of contexts through trial-and-error learning. That said, we agree that investigating whether and how people spontaneously infer elasticity is an interesting direction for future work. We will clarify the scope of the present conclusions in the revised manuscript.

      Finally, I turn to claim 2, that individual differences in how people infer elasticity are importantly related to psychopathology. There is much to say about the decision to treat psychopathology as a unidimensional construct. However, I will keep it concrete and simply note that CCA (by design) obscures the relationship between any two variables. Thus, as suggestive as Figure 6B is, we cannot conclude that there is a strong relationship between Sense of Agency and the elasticity bias---this result is consistent with any possible relationship (even a negative one). The fact that the direct relationship between these two variables is not shown or reported leads me to infer that they do not have a significant or strong relationship in the data.

      We agree that CCA is not designed to reveal the relationship between any two variables. However, the advantage of this analysis is that it pulls together information from multiple variables. Doing so does not treat psychopathology as unidimensional. Rather, it seeks a particular dimension that most strongly correlates with different aspects of task performance. This is especially useful for multidimensional psychopathology data because such data are often dominated by strong correlations between dimensions, whereas the research seeks to explain the distinctions between the dimensions. Similar considerations hold for the multidimensional task parameters, which although less correlated, may still jointly predict the relevant psychopathological profile better than each parameter does in isolation. Thus, the CCA enabled us to identify a general relationship between task performance and psychopathology that accounts for different symptom measures and aspects of controllability inference. 

      Using CCA can thus reveal relationships that do not readily show up in two-variable analyses. Indeed, the direct correlation between Sense of Agency (SOA) and elasticity bias was not significant – a result that, for completeness, we will now report in the supplementary materials along with all other direct correlations. We note, however, that the CCA analysis was preregistered and its results were replicated. Furthermore, an auxiliary analysis specifically confirmed the contributions of both elasticity bias (Figure 6D, bottom plot) and, although not reported in the original paper, of the Sense of Agency score (SOA; p\=.03 permutation test) to the observed canonical correlation. Participants scoring higher on the psychopathology profile also overinvested resources in inelastic environments but did not futilely invest in uncontrollable environments (Figure 6A), providing external validation to the conclusion that the CCA captured meaningful variance specific to elasticity inference. The results thus enable us to safely conclude that differences in elasticity inferences are significantly associated with a profile of controlrelated psychopathology to which SOA contributed significantly.  

      Finally, whereas interpretation of individual CCA loadings that were not specifically tested remains speculative, we note that the pattern of loadings largely replicated across the initial and replication studies (see Figure 6B), and aligns with prior findings. For instance, the positive loadings of SOA and OCD match prior suggestions that a lower sense of control leads to greater compensatory effort(7), whereas the negative loading for depression scores matches prior work showing reduced resource investment in depression(5-6).

      We will revise the text to better clarify the advantageous and disadvantageous of our analytical approach, and the conclusions that can and cannot be drawn from it.

      There is also a feature of the task that limits our ability to draw strong conclusions about individual differences in elasticity inference. As the authors clearly acknowledge, the task was designed "to be especially sensitive to overestimation of elasticity" (line 287). A straightforward consequence of this is that the resulting *empirical* estimate of estimation bias (i.e., the gamma_elasticity parameter) is itself biased. This immediately undermines any claim that references the directionality of the elasticity bias (e.g. in the abstract). Concretely, an undirected deficit such as slower learning of elasticity would appear as a directed overestimation bias. When we further consider that elasticity inference is the only meaningful learning/decisionmaking problem in the task (argued above), the situation becomes much worse. Many general deficits in learning or decision-making would be captured by the elasticity bias parameter. Thus, a conservative interpretation of the results is simply that psychopathology is associated with impaired learning and decision-making.

      We apologize for our imprecise statement that the task was ‘especially sensitive to overestimation of elasticity’, which justifiably led to Reviewer’s concern that slower elasticity learning can be mistaken for elasticity bias. To make sure this was not the case, we made use of the fact that our computational model explicitly separates bias direction (λ) from the rate of learning through two distinct parameters, which initialize the prior concentration and mean of the model’s initial beliefs concerning elasticity (see Methods pg. 22). The higher the concentration of the initial beliefs (𝜖), the slower the learning. Parameter recovery tests confirmed that our task enables acceptable recovery of both the bias λ<sub>elasticity</sub> (r=.81) and the concentration 𝝐<sub>elasticity</sub> (r=.59) parameters. And importantly, the level of confusion between the parameters was low (confusion of 0.15 for 𝝐<sub>elasticity</sub>→ λ<sub>elasticity</sub> and 0.04 for λ<sub>elasticity</sub>→ 𝝐<sub>elasticity</sub>). This result confirms that our task enables dissociating elasticity biases from the rate of elasticity learning. 

      Moreover, to validate that the minimal level of confusion existing between bias and the rate of learning did not drive our psychopathology results, we re-ran the CCA while separating concentration from bias parameters. The results (Author response image 1) demonstrate that differences in learning rate (𝜖) had virtually no contribution to our CCA results, whereas the contribution of the pure bias (𝜆) was preserved. 

      We will incorporate these clarifications and additional analysis in our revised manuscript.

      Author response image 1.

      Showing that a model parameter correlates with the data it was fit to does not provide any new information, and cannot support claims like "a prior assumption that control is likely available was reflected in a futile investment of resources in uncontrollable environments." To make that claim, one must collect independent measures of the assumption and the investment.

      We apologize if this and related statements seemed to be describing independent findings. They were merely meant to describe the relationship between model parameters and modelindependent measures of task performance. It is inaccurate, though, to say that they provide no new information, since results could have been otherwise. For instance, instead of a higher controllability bias primarily associating with futile investment of resources in uncontrollable environments, it could have been primarily associated with more proper investment of resources in high-controllability environments. Additionally, we believe these analyses are of value to readers who seek to understand the role of different parameters in the model. In our planned revision, we will clarify that the relevant analyses are merely descriptive. 

      Did participants always make two attempts when purchasing tickets? This seems to violate the intuitive model, in which you would sometimes succeed on the first jump. If so, why was this choice made? Relatedly, it is not clear to me after a close reading how the outcome of each trial was actually determined.

      We thank the reviewer for highlighting the need to clarify these aspects of the task in the revised manuscript. 

      When participants purchased two extra tickets, they attempted both jumps, and were never informed about whether either of them succeeded. Instead, after choosing a vehicle and attempting both jumps, participants were notified where they arrived at. This outcome was determined based on the cumulative probability of either of the two jumps succeeding. Success meant that participants arrived at where their chosen vehicle goes, whereas failure meant they walked to the nearest location (as determined by where they started from). 

      Though it is unintuitive to attempt a second jump before seeing whether the first succeed, this design choice ensured two key objectives. First, that participants would consistently need to invest not only more money but also more effort and time in planets with high elastic controllability. Second, that the task could potentially generalize to the many real-world situations where the amount of invested effort has to be determined prior to seeing any outcome, for instance, preparing for an exam or a job interview. 

      It should be noted that the model is heuristically defined and does not reflect Bayesian updating. In particular, it overestimates control by not using losses with less than 3 tickets (intuitively, the inference here depends on your beliefs about elasticity). I wonder if the forced three-ticket trials in the task might be historically related to this modeling choice.

      We apologize for not making this clear, but in fact losing with less than 3 tickets does reduce the model’s estimate of available control. It does so by increasing the elasticity estimates

      (a<sub>elastic≥1</sub>, a<sub>elastic2</sub> parameters), signifying that more tickets are needed to obtain the maximum available level of control, thereby reducing the average controllability estimate across ticket investment options. 

      It would be interesting to further develop the model such that losing with less than 3 tickets would also impact inferences concerning the maximum available control, depending on present beliefs concerning elasticity, but the forced three-ticket purchases already expose participants to the maximum available control, and thus, the present data may not be best suited to test such a model. These trials were implemented to minimize individual differences concerning inferences of maximum available control, thereby focusing differences on elasticity inferences. We will discuss the Reviewer’s suggestion for a potentially more accurate model in the revised manuscript. 

      References

      (1) Huys, Q. J. M., & Dayan, P. (2009). A Bayesian formulation of behavioral control. Cognition, 113(3), 314– 328.

      (2) Ligneul, R. (2021). Prediction or causation? Towards a redefinition of task controllability. Trends in Cognitive Sciences, 25(6), 431–433.

      (3) Mistry, P., & Liljeholm, M. (2016). Instrumental divergence and the value of control. Scientific Reports, 6, 36295.

      (4) Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145–151

      (5) Cohen RM, Weingartner H, Smallberg SA, Pickar D, Murphy DL. Effort and cognition in depression. Arch Gen Psychiatry. 1982 May;39(5):593-7. doi: 10.1001/archpsyc.1982.04290050061012. PMID: 7092490.

      (6) Bi R, Dong W, Zheng Z, Li S, Zhang D. Altered motivation of effortful decision-making for self and others in subthreshold depression. Depress Anxiety. 2022 Aug;39(8-9):633-645. doi: 10.1002/da.23267. Epub 2022 Jun 3. PMID: 35657301; PMCID: PMC9543190.

      (7) Tapal, A., Oren, E., Dar, R., & Eitam, B. (2017). The Sense of Agency Scale: A measure of consciously perceived control over one's mind, body, and the immediate environment. Frontiers in Psychology, 8, 1552

    1. Author response: 

      We thank the reviewers for their feedback on our paper. We have taken all their comments into account in revising the manuscript. We provide a point-by-point response to their comments, below.

      Reviewer #1:

      Major comments:

      The manuscript is clearly written with a level of detail that allows others to reproduce the imaging and cell-tracking pipeline. Of the 22 movies recorded one was used for cell tracking. One movie seems sufficient for the second part of the manuscript, as this manuscript presents a proof-of-principle pipeline for an imaging experiment followed by cell tracking and molecular characterisation of the cells by HCR. In addition, cell tracking in a 5-10 day time-lapse movie is an enormous time commitment.

      My only major comment is regarding "Suppl_data_5_spineless_tracking". The image file does not load.

      It looks like the wrong file is linked to the mastodon dataset. The "Current BDV dataset path" is set to "Beryl_data_files/BLB mosaic cut movie-02.xml", but this file does not exist in the folder. Please link it to the correct file.

      We have corrected the file path in the updated version of Suppl. Data 5.

      Minor comments:

      The authors state that their imaging settings aim to reduce photo damage. Do they see cell death in the regenerating legs? Is the cell death induced by the light exposure or can they tell if the same cells die between the movies? That is, do they observe cell death in the same phases of regeneration and/or in the same regions of the regenerating legs?

      Yes, we observe cell death during Parhyale leg regeneration. We have added the following sentence to explain this in the revised manuscript: "During the course of regeneration some cells undergo apoptosis (reported in Alwes et al., 2016). Using the H2B-mRFPruby marker, apoptotic cells appear as bright pyknotic nuclei that break up and become engulfed by circulating phagocytes (see bright specks in Figure 2F)."

      We now also document apoptosis in regenerated legs that have not been subjected to live imaging in a new supplementary figure (Suppl. Figure 3),  and we refer to these observations as follows: "While some cell death might be caused by photodamage, apoptosis can also be observed in similar numbers in regenerating legs that have not been subjected to live imaging (Suppl. Figure 3)."

      Based on 22 movies, the authors divide the regeneration process into three phases and they describe that the timing of leg regeneration varies between individuals. Are the phases proportionally the same length between regenerating legs or do the authors find differences between fast/slow regenerating legs? If there is a difference in the proportions, why might this be?

      Both early and late phases contribute to variation in the speed of regeneration, but there is no clear relationship between the relative duration of each phase and the speed of regeneration. We now present graphs supporting these points in a new supplementary figure (Suppl. Figure 2).  

      To clarify this point, we have added the following sentence in the manuscript: "We find that the overall speed of leg regeneration is determined largely by variation in the speed of the early (wound closure) phase of regeneration, and to a lesser extent by variation in later phases when leg morphogenesis takes place (Suppl. Figure 2 A,B). There is no clear relationship between the relative duration of each phase and the speed of regeneration (Suppl. Figure 2 A',B')."

      Based on their initial cell tracing experiment, could the authors elaborate more on what kind of biological information can be extracted from the cell lineages, apart from determining which is the progenitor of a cell? What does it tell us about the cell population in the tissue? Is there indication of multi- or pluripotent stem cells? What does it say about the type of regeneration that is taking place in terms of epimorphosis and morphallaxis, the old concepts of regeneration?

      In the first paragraph of Future Directions we describe briefly the kind of biological information that could be gained by applying our live imaging approach with appropriate cell-type markers (see below). We do not comment further, as we do not currently have this information at hand. Regarding the concepts of epimorphosis and morphallaxis, as we explain in Alwes et al. 2016, these terms describe two extreme conditions that do not capture what we observe during Parhyale leg regeneration. Our current work does not bring new insights on this topic.

      Page 5. The authors mention the possibility of identifying the cell ID based on transcriptomic profiling data. Can they suggest how many and which cell types they expect to find in the last stage based on their transcriptomic data?

      We have added this sentence: "Using single-nucleus transcriptional profiling, we have identified approximately 15 transcriptionally-distinct cell types in adult Parhyale legs (Almazán et al., 2022), including epidermis, muscle, neurons, hemocytes, and a number of still unidentified cell types."

      Page 6. Correction: "..molecular and other makers.." should be "..molecular and other markers.."

      Corrected

      Page 8. The HCR in situ protocol probably has another important advantage over the conventional in situ protocol, which is not mentioned in this study. The hybridisation step in HCR is performed at a lower temperature (37˚C) than in conventional in situ hybridisation (65˚C, Rehm et al., 2009). In other organisms, a high hybridisation temperature affects the overall tissue morphology and cell location (tissue shrinkage). A lower hybridisation temperature has less impact on the tissue and makes manual cell alignment between the live imaging movie and the fixed HCR in situ stained specimen easier and more reliable. If this is also the case in Parhyale, the authors must mention it.

      This may be correct, but all our specimens were treated at 37˚C, so we cannot assess whether hybridisation temperature affects morphological preservation in our specimens.

      Page 9. The authors should include more information on the spineless study. What been is spineless? What do the cell lineages tell about the spineless progenitors, apart from them being spread in the tissue at the time of amputation? Do spineless progenitors proliferate during regeneration? Do any spineless expressing cells share a common progenitor cell?

      We now point out that spineless encodes a transcription factor. We provide a summary of the lineages generating spineless-expressing cells in Suppl. Figure 6, and we explain that "These epidermal progenitors undergo 0, 1 or 2 cell divisions, and generate mostly spineless-expressing cells (Suppl. Figure 5)."

      Page 10. Regarding the imaging temperature, the Materials and Methods state "... a temperature control chamber set to 26 or 27˚C..."; however, in Suppl. Data 1, 26˚C and 29˚C are indicated as imaging temperatures. Which is correct?

      We corrected the Methods by adding "with the exception of dataset li51, imaged at 29°C"

      Page 10. Regarding the imaging step size, the Materials and Methods state "...step size of 1-2.46 µm..."; however, Suppl. Data 1 indicate a step size between 1.24 - 2.48 µm. Which is correct?

      We corrected the Methods.

      Page 11. Correct "...as the highest resolution data..." to "...at the highest resolution data..."

      The original text is correct ("standardised to the same dimensions as the highest resolution data").

      Page 11. Indicate which supplementary data set is referred to: "Using Mastodon, we generated ground truth annotations on the original image dataset, consisting of 278 cell tracks, including 13,888 spots and 13,610 links across 55 time points (see Supplementary Data)."

      Corrected

      p. 15. Indicate which supplementary data set is referred to: "In this study we used HCR probes for the Parhyale orthologues of futsch (MSTRG.441), nompA (MSTRG.6903) and spineless (MSTRG.197), ordered from Molecular Instruments (20 oligonucleotides per probe set). The transcript sequences targeted by each probe set are given in the Supplementary Data."

      Corrected

      Figure 3. Suggestion to the overview schematics: The authors might consider adding "molting" as the end point of the red bar (representing differentiation).

      The time of molting is not known in the majority of these datasets, because the specimens were fixed and stained prior to molting. We added the relevant information in the figure legend: "Datasets li-13 and li-16 were recorded until the molt; the other recordings were stopped before molting."

      Figure 4B': Please indicate that the nuclei signal is DAPI.

      Corrected

      Supplementary figure 1A. Word is missing in the figure legend: ...the image also shows weak…

      Corrected

      Supplementary Figure 2: Please indicate the autofluorescence in the granular cells. Does it correspond to the yellow cells?

      Corrected

      Video legend for video 1 and 2. Please correct "H2B-mREFruby" to "H2B-mRFPruby".

      Corrected

      Reviewer #2:

      Major comments:

      MC 1. Given that most of the technical advances necessary to achieve the work described in this manuscript have been published previously, it would be helpful for the authors to more clearly identify the primary novelty of this manuscript. The abstract and introduction to the manuscript focus heavily on the technical details of imaging and analysis optimization and some additional summary of the implications of these advances should be included here to aid the reader.

      This paper describes a technical advance. While previous work (Alwes et al. 2016) established some key elements of our live imaging approach, we were not at that time able to record the entire time course of leg regeneration (the longest recordings were 3.5 days long). Here we present a method for imaging the entire course of leg regeneration (up to 10 days of imaging), optimised to reduce photodamage and to improve cell tracking. We also develop a method of in situ staining in cuticularised adult legs (an important technical breakthrough in this experimental system), which we combine with live imaging to determine the fate of tracked cells. We have revised the abstract and introduction of the paper to point out these novelties, in relation to our previous publications.

      In the abstract we explain: "Building on previous work that allowed us to image different parts of the process of leg regeneration in the crustacean Parhyale hawaiensis, we present here a method for live imaging that captures the entire process of leg regeneration, spanning up to 10 days, at cellular resolution. Our method includes (1) mounting and long-term live imaging of regenerating legs under conditions that yield high spatial and temporal resolution but minimise photodamage, (2) fixing and in situ staining of the regenerated legs that were imaged, to identify cell fates, and (3) computer-assisted cell tracking to determine the cell lineages and progenitors of identified cells. The method is optimised to limit light exposure while maximising tracking efficiency."

      The introduction includes the following text: "Our first systematic study using this approach presented continuous live imaging over periods of 2-3 days, capturing key events of leg regeneration such as wound closure, cell proliferation and morphogenesis of regenerating legs with single-cell resolution (Alwes et al., 2016). Here, we extend this work by developing a method for imaging the entire course of leg regeneration, optimised to reduce photodamage and to improve cell tracking. We also develop a method of in situ staining of gene expression in cuticularised adult legs, which we combine with live imaging to determine the fate of tracked cells."

      MC 2. The description of the regeneration time course is nicely detailed but also very qualitative. A major advantage of continuous recording and automated cell tracking in the manner presented in this manuscript would be to enable deeper quantitative characterization of cellular and tissue dynamics during regeneration. Rather than providing movies and manually annotated timelines, some characterization of the dynamics of the regeneration process (the heterogeneity in this is very very interesting, but not analyzed at all) and correlating them against cellular behaviors would dramatically increase the impact of the work and leverage the advances presented here. For example, do migration rates differ between replicates? Division rates? Division synchrony? Migration orientation? This seems to be an incredibly rich dataset that would be fascinating to explore in greater detail, which seems to me to be the primary advance presented in this manuscript. I can appreciate that the authors may want to segregate some biological findings from the method, but I believe some nominal effort highlighting the quantitative nature of what this method enables would strengthen the impact of the paper and be useful for the reader. Selecting a small number of simple metrics (eg. Division frequency, average cell migration speed) and plotting them alongside the qualitative phases of the regeneration timeline that have already been generated would be a fairly modest investment of effort using tools that already exist in the Mastodon interface, I would roughly estimate on the order of an hour or two per dataset. I believe that this effort would be well worth it and better highlight a major strength of the approach.

      The primary goal of this work was to establish a robust method for continuous long-term live imaging of regeneration, but we do appreciate that a more quantitative analysis would add value to the data we are presenting. We tried to address this request in three steps:

      First, we examined whether clear temporal patterns in cell division, cell movements or other cellular features can be observed in an accurately tracked dataset (li13-t4, tracked in Sugawara et al. 2022). To test this we used the feature extraction functions now available on the Mastodon platform (see link). We could discern a meaningful temporal pattern for cell divisions (see below); the other features showed no interpretable pattern of variation.

      Second, we asked whether we could use automated cell tracking to analyse the patterns of cell division in all our datasets. Using an Elephant deep learning model trained on the tracks of the li13-t4 dataset, we performed automated cell tracking in the same dataset, and compared the pattern of cell divisions from the automated cell track predictions with those coming from manually validated cell tracks. We observed that the automated tracks gave very imprecise results, with a high background of false positives obscuring the real temporal pattern (see images below, with validated data on the left, automated tracking on the right). These results show that the automated cell tracking is not accurate enough to provide a meaningful picture on the pattern of cell divisions.

      Third, we tried to improve the accuracy of detection of dividing cells by additional training of Elephant models on each dataset (to lower the rate of false positives), followed by manual proofreading. Given how labour intensive this is, we could only apply this approach to 4 additional datasets. The results of this analysis are presented in Figure 4.

      Author response image 1.

      MC 3. The authors describe the challenges faced by their described approach:

      Using this mode of semi-automated and manual cell tracking, we find that most cells in the upper slices of our image stacks (top 30 microns) can be tracked with a high degree of confidence. A smaller proportion of cell lineages are trackable in the deeper layers.

      Given that the authors quantify this in Table 1, it would aid the reader to provide metrics in the manuscript text at this point. Furthermore, the metrics provided in Table 1 appear to be for overall performance, but the text describes that performance appears to be heavily depth dependent. Segregating the performance metrics further, for example providing DET, TRA, precision and recall for superficial layers only and for the overall dataset, would help support these arguments and better highlight performance a potential adopter of the method might expect.

      In the revised manuscript we have added data on the tracking performance of Elephant in relation to imaging depth in Suppl. Figure 3. These data confirm our original statement (which was based on manual tracking) that nuclei are more challenging to track in deeper layers.

      We point to these new results in two parts of the paper, as follows: "A smaller proportion of cells are trackable in the deeper layers (see Suppl. Figure 3)", and "Our results, summarised in Table 1A, show that the detection of nuclei can be enhanced by doubling the z resolution at the expense of xy resolution and image quality. This improvement is particularly evident in the deeper layers of the imaging stacks, which are usually the most challenging to track (Suppl. Figure 3)."

      MC 4. Performance characterization in Table 1 appears to derive from a single dataset that is then subsampled and processed in different ways to assess the impact of these changes on cell tracking and detection performance. While this is a suitable strategy for this type of optimization it leaves open the question of performance consistency across datasets. I fully recognize that this type of quantification can be onerous and time consuming, but some attempt to assess performance variability across datasets would be valuable. Manual curation over a short time window over a random sampling of the acquired data would be sufficient to assess this.

      We think that similar trade-offs will apply to all our datasets because tracking performance is constrained by the same features, which are intrinsic to our system; e.g. by the crowding of nuclei in relation to axial resolution, or the speed of mitosis in relation to the temporal resolution of imaging. We therefore do not see a clear rationale for repeating this analysis. On a practical level, our existing image datasets could not be subsampled to generate the various conditions tested in Table 1, so proving this point experimentally would require generating new recordings, and tracking these to generate ground truth data. This would require months of additional work.

      A second, related question is whether Elephant would perform equally well in detecting and tracking nuclei across different datasets. This point has been addressed in the Sugawara et al. 2022 paper, where the performance of Elephant was tested on diverse fluorescence datasets.

      Reviewer #3:

      Major comments:

      • The authors should clearly specify what are the key technical improvements compared to their previous studies (Alwes et al. 2016, Elife; Konstantinides & Averof 2014, Science). There, the approaches for mounting, imaging, and cell tracking are already introduced, and the imaging is reported to run for up to 7 days in some cases.

      In Konstantinides and Averof (2014) we did not present any live imaging at cellular resolution. In Alwes et al. (2016) we described key elements of our live imaging approach, but we were never able to record the entire time course of leg regeneration. The longest recordings in that work were 3.5 days long.

      We have revised the abstract and introduction to clarify the novelty of this work, in relation to our previous publications. Please see our response to comment MC1 of reviewer 2.

      • While the authors mention testing the effect of imaging parameters (such as scanning speed and line averaging) on the imaging/tracking outcome, very little or no information is provided on how this was done beyond the parameters that they finally arrived to.

      Scan speed and averaging parameters were determined by measuring contrast and signal-to-noise ratios in images captured over a range of settings. We have now added these data in Supplementary Figure 1.

      • The authors claim that, using the acquired live imaging data across entire regeneration time course, they are now able to confirm and extend their description of leg regeneration. However, many claims about the order and timing of various cellular events during regeneration are supported only by references to individual snapshots in figures or supplementary movies. Presenting a more quantitative description of cellular processes during regeneration from the acquired data would significantly enhance the manuscript and showcase the usefulness of the improved workflow.

      The events we describe can be easily observed in the maximum projections, available in Suppl. Data 2. Regarding the quantitative analysis, please see our response to comment MC2 of reviewer 2.  

      • Table 1 summarizes the performance of cell tracking using simulated datasets of different quality. However only averages and/or maxima are given for the different metrics, which makes it difficult to evaluate the associated conclusions. In some cases, only 1 or 2 test runs were performed.

      The metrics extracted from each of the three replicates, per dataset, are now included in Suppl. Data 4.

      We consistently used 3 replicates to measure tracking performance with each of the datasets. The "replicates" column label in Table 1 referred to the number of scans that were averaged to generate the image, not to the replicates used for estimating the tracking performance. To avoid confusion, we changed that label to "averaging".

      • OPTIONAL: An imaging approach that allows using the current mounting strategy but could help with some of the tradeoffs is using a spinning-disk confocal microscope instead of a laser scanning one. If the authors have such a system available, it could be interesting to compare it with their current scanning confocal setup.

      Preliminary experiments that we carried out several years ago on a spinning disk confocal (with a 20x objective and the CSU-W1 spinning disk) were not very encouraging, and we therefore did not pursue this approach further. The main problem was bad image quality in deeper tissue layers.

      Minor comments:

      • The presented imaging protocol was optimized for one laser wavelength only (561 nm) - this should be mentioned when discussing the technical limitations since animals tend to react differently to different wavelengths. Same settings might thus not be applicable for imaging a different fluorescent protein.

      In the second paragraph of the Results section, we explain that we perform the imaging at long wavelengths in order to minimise photodamage. It should be clear to the readers that changing the excitation wavelength will have an impact for long-term live imaging.

      • For transferability, it would be useful if the intensity of laser illumination was measured and given in the Methods, instead of just a relative intensity setting from the imaging software. Similarly,more details of the imaging system should be provided where appropriate (e.g., detector specifications).

      We have now measured the intensity of the laser illumination and added this information in the

      Methods: "Laser power was typically set to 0.3% to 0.8%, which yields 0.51 to 1.37 µW at 561 nm (measured with a ThorLabs Microscope Slide Power Sensor, #S170C)."

      Regarding the imaging system and the detector, we provide all the information that is available to us on the microscope's technical sheets.

      • The versions of analysis scripts associated with the manuscript should be uploaded to an online repository that permanently preserves the respective version.

      The scripts are now available on gitbub and online repositories. The relevant links are included in the revised manuscript.

    1. Reviewer #2 (Public Review):

      Summary:

      The goal of the authors in this study is to develop a more reliable approach for quantifying codon usage such that it is more comparable across species. Specifically, the authors wish to estimate the degree of adaptive codon usage, which is potentially a general proxy for the strength of selection at the molecular level. To this end, the authors created the Codon Adaptation Index for Species (CAIS) that controls for differences in amino acid usage and GC% across species. Using their new metric, the authors find a previously unobserved negative correlation between the overall adaptiveness of codon usage and body size across 118 vertebrates. As body size is negatively correlated with effective population size and thus the general strength of natural selection, the negative correlation between CAIS and body size is expected. The authors argue this was previously unobserved due to failures of other popular metrics such as Codon Adaptation Index (CAI) and the Effective Number of Codons (ENC) to adequately control for differences in amino acid usage and GC content across species. Most surprisingly, the authors also find a positive relationship between CAIS and the overall "disorderedness" of a species protein domains. As some of these results are unexpected, which is acknowledged by the authors, I think it would be particularly beneficial to work with some simulated datasets. I think CAIS has the potential to be a valuable tool for those interested in comparing codon adaptation across species in certain situations. However, I have certain theoretical concerns about CAIS as a direct proxy for the efficiency of selection when the mutation bias changes across species.

      Strengths:

      (1) I appreciate that the authors recognize the potential issues of comparing CAI when amino acid usage varies and correct for this in CAIS. I think this is sometimes an under-appreciated point in the codon usage literature, as CAI is a relative measure of codon usage bias (i.e. only considers synonyms). However, the strength of natural selection on codon usage can potentially vary across amino acids, such that comparing mean CAI between protein regions with different amino acid biases may result in spurious signals of statistical significance (see Cope et al. Biochemica et Biophysica Acta - Biomembranes 2018 for a clear example of this).

      (2) The authors present numerous analysis using both ENC and mean CAI as a comparison to CAIS, helping given a sense of how CAIS corrects for some of the issues with these other metrics. I also enjoyed that they examined the previously unobserved relationship between codon usage bias and body size, which has bugged me ever since I saw Kessler and Dean 2014. The result comparing protein disorder to CAIS was particularly interesting and unexpected.

      (3) The CAIS metric presented here is generally applicable to any species that has an annotated genome with protein-coding sequences.

      Weaknesses:

      (1) The main weakness of this work is that it lacks simulated data to confirm that it works as expected. This would be particularly useful for assessing the relationship between CAIS and the overall effect of protein structure disorder, which the authors acknowledge is an unexpected result. I think simulations could also allow the authors to assess how their metric performs in situations where mutation bias and natural selection act in the same direction vs. opposite directions. Additionally, although I appreciate their comparisons to ENC and mean CAI, the lack of comparison to other popular codon metrics for calculating the overall adaptiveness of a genome (e.g. dos Reis et al.'s statistic, which is a function of tRNA Adaptation Index (tAI) and ENC) may be more appropriate. Even if results are similar to , CAIS has a noted advantage that it doesn't require identifying tRNA gene copy numbers or abundances, which I think are generally less readily available than genomic GC% and protein-coding sequences.

      The authors mention the selection-mutation-drift equilibrium model, which underlies the basic ideas of this work (e.g. higher results in stronger selection on codon usage), but a more in-depth framing of CAIS in terms of this model is not given. I think this could be valuable, particularly in addressing the question "are we really estimating what we think we're estimating?"

      Let's take a closer look at the formulation for RSCUS. From here on out, subscripts will only be used to denote the codon and it will be assumed that we are only considering the case of for some species

      I think what the authors are attempting to do is "divide out" the effects of mutation bias (as given by , such that only the effects of natural selection remain, i.e. deviations from the expected frequency based on mutation bias alone represent adaptive codon usage. Consider Gilchrist et al. MBE 2015, which says that the expected frequency of codon at selection-mutation-drift equilibrium in gene for an amino acid with synonymous codons is

      where is the mutation bias, is the strength of selection scaled by the strength of drift, and is the gene expression level of gene \(g\). In this case, \ and reflect the strength and direction of mutation bias and natural selection relative to a reference codon, for which . Assuming the selection-mutation-drift equilibrium model is generally adequate to model the true codon usage patterns in a genome (as I do and I think the authors do, too), the could be considered the expected observed frequency codon in gene .

      Let's re-write the in the form of Gilchrist et al., such that it is a function of mutation bias . For simplicity, we will consider just the two-codon case and assume the amino acid sequence is fixed. Assuming GC% is at equilibrium, the term and can be written as

      where is the mutation rate from nucleotides to. As described in Gilchrist et al. MBE 2015 and Shah and Gilchrist PNAS 2011, the mutation bias . This can be expressed in terms of the equilibrium GC content by recognizing that

      As we are assuming the amino acid sequence is fixed, the probability of observing a synonymous codon at an amino acid becomes just a Bernoulli process.

      If we do this, then

      Recall that in the Gilchrist et al. framework, the reference codon has . Thus, we have recovered the Gilchrist et al. model from the formulation of under the assumption that natural selection has no impact on codon usage and codon NNG is the pre-defined reference codon. To see this, plug in 0 for in equation (1).

      We can then calculate the expected RSCUS using equation (1) (using notation and equation (6) for the two codon case. For simplicity assume, we are only considering a gene of average expression (defined as . Assume in this case that NNG is the reference codon .

      This shows that the expected value of RSCUS for a two-codon amino acid is expected to increase as the strength of selection increases, which is desired. Note that in Gilchrist et al. is formulated in terms of selection against a codon relative to the reference, such that a negative value represents that a codon is favored relative to the reference. If (i.e. selection does not favor either codon), then . Also note that the expected RSCUS does not remain independent of the mutation bias. This means that even if (i.e. the strength of natural selection) does not change between species, changes to the strength and direction of mutation bias across species could impact RSCUS. Assuming my math is right, I think one needs to be cautious when interpreting CAIS as representative of the differences in the efficiency of selection across species except under very particular circumstances. One such case could be when it is known that mutation bias varies little across the species of interest. Looking at the species used in this manuscript, most of them have a GC content ranging around 0.41, so I suspect their results are okay.

      Although I have not done so, I am sure this could be extended to the 4 and 6 codon amino acids.

      Another minor weakness of this work is that although the method is generally applicable to any species with an annotated genome and the code is publicly available, the code itself contains hard-coded values for GC% and amino acid frequencies across the 118 vertebrates. The lack of a more flexible tool may make it difficult for less computationally-experienced researchers to take advantage of this method.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The image analysis pipeline is tested in analysing microscopy imaging data of gastruloids of varying sizes, for which an optimised protocol for in toto image acquisition is established based on whole mount sample preparation using an optimal refractive index matched mounting media, opposing dual side imaging with two-photon microscopy for enhanced laser penetration, dual view registration, and weighted fusion for improved in toto sample data representation. For enhanced imaging speed in a two-photon microscope, parallel imaging was used, and the authors performed spectral unmixing analysis to avoid issues of signal cross-talk.

      In the image analysis pipeline, different pre-treatments are done depending on the analysis to be performed (for nuclear segmentation - contrast enhancement and normalisation; for quantitative analysis of gene expression - corrections for optical artifacts inducing signal intensity variations). Stardist3D was used for the nuclear segmentation. The study analyses into properties of gastruloid nuclear density, patterns of cell division, morphology, deformation, and gene expression.

      Strengths:

      The methods developed are sound, well described, and well-validated, using a sample challenging for microscopy, gastruloids. Many of the established methods are very useful (e.g. registration, corrections, signal normalisation, lazy loading bioimage visualisation, spectral decomposition analysis), facilitate the development of quantitative research, and would be of interest to the wider scientific community.

      We thank the reviewer for this positive feedback.

      Weaknesses:

      A recommendation should be added on when or under which conditions to use this pipeline.

      We thank the reviewer for this valuable feedback, which will be addressed in the revision. In general, the pipeline is applicable to any tissue, but it is particularly useful for large and dense 3D samples—such as organoids, embryos, explants, spheroids, or tumors—that are typically composed of multiple cell layers and have a thickness greater than 50 µm.

      The processing and analysis pipeline are compatible with any type of 3D imaging data (e.g. confocal, 2 photon, light-sheet, live or fixed).

      - Spectral unmixing to remove signal cross-talk of multiple fluorescent targets is typically more relevant in two-photon imaging due to the broader excitation spectra of fluorophores compared to single-photon imaging. In confocal or light-sheet microscopy, alternating excitation wavelengths often circumvents the need for unmixing. Spectral decomposition performs even better with true spectral detectors; however, these are usually not non-descanned detectors, which are more appropriate for deep tissue imaging. Our approach demonstrates that simultaneous cross-talk-free four-color two-photon imaging can be achieved in dense 3D specimen with four non-descanned detectors and co-excitation by just two laser lines. Depending on the dispersion in optically dense samples, depth-dependent apparent emission spectra need to be considered.

      - Nuclei segmentation using our trained StarDist3D model is applicable to any system under two conditions: (1) the nuclei exhibit a star-convex shape, as required by the StarDist architecture, and (2) the image resolution is sufficient in XYZ to allow resampling. The exact sampling required is object- and system-dependent, but the goal is to achieve nearly isotropic objects with diameters of approximately 15 pixels while maintaining image quality. In practice, images containing objects that are natively close to or larger than 15 pixels in diameter should segment well after resampling. Conversely, images with objects that are significantly smaller along one or more dimensions will require careful inspection of the segmentation results.

      - Normalization is broadly applicable to multicolor data when at least one channel is expected to be ubiquitously expressed within its domain. Wavelength-dependent correction requires experimental calibration using either an ubiquitous signal at each wavelength. Importantly, this calibration only needs to be performed once for a given set of experimental conditions (e.g., fluorophores, tissue type, mounting medium).

      - Multi-scale analysis of gene expression and morphometrics is applicable to any 3D multicolor image. This includes both the 3D visualization tools (Napari plugins) and the various analytical plots (e.g., correlation plots, radial analysis). Multi-scale analysis can be performed even with imperfect segmentation, as long as segmentation errors tend to cancel out when averaged locally at the relevant spatial scale. However, systematic errors—such as segmentation uncertainty along the Z-axis due to strong anisotropy—may accumulate and introduce bias in downstream analyses. Caution is advised when analyzing hollow structures (e.g., curved epithelial monolayers with large cavities), as the pipeline was developed primarily for 3D bulk tissues, and appropriate masking of cavities would be needed.

      Reviewer #2 (Public review):

      Summary:

      This study presents an integrated experimental and computational pipeline for high-resolution, quantitative imaging and analysis of gastruloids. The experimental module employs dual-view two-photon spectral imaging combined with optimized clearing and mounting techniques to image whole-mount immunostained gastruloids. This approach enables the acquisition of comprehensive 3D images that capture both tissue-scale and single-cell level information.

      The computational module encompasses both pre-processing of acquired images and downstream analysis, providing quantitative insights into the structural and molecular characteristics of gastruloids. The pre-processing pipeline, tailored for dual-view two-photon microscopy, includes spectral unmixing of fluorescence signals using depth-dependent spectral profiles, as well as image fusion via rigid 3D transformation based on content-based block-matching algorithms. Nuclei segmentation was performed using a custom-trained StarDist3D model, validated against 2D manual annotations, and achieving an F1 score of 85+/-3% at a 50% intersection-over-union (IoU) threshold. Another custom-trained StarDist3D model enabled accurate detection of proliferating cells and the generation of 3D spatial maps of nuclear density and proliferation probability. Moreover, the pipeline facilitates detailed morphometric analysis of cell density and nuclear deformation, revealing pronounced spatial heterogeneities during early gastruloid morphogenesis.

      All computational tools developed in this study are released as open-source, Python-based software.

      Strengths:

      The authors applied two-photon microscopy to whole-mount deep imaging of gastruloids, achieving in toto visualization at single-cell resolution. By combining spectral imaging with an unmixing algorithm, they successfully separated four fluorescent signals, enabling spatial analysis of gene expression patterns.

      The entire computational workflow, from image pre-processing to segmentation with a custom-trained StarDist3D model and subsequent quantitative analysis, is made available as open-source software. In addition, user-friendly interfaces are provided through the open-source, community-driven Napari platform, facilitating interactive exploration and analysis.

      We thank the reviewer for this positive feedback.

      Weaknesses:

      The computational module appears promising. However, the analysis pipeline has not been validated on datasets beyond those generated by the authors, making it difficult to assess its general applicability.

      We agree that applying our analysis pipeline to published datasets—particularly those acquired with different imaging systems—would be valuable. However, only a few high-resolution datasets of large organoid samples are publicly available, and most of these either lack multiple fluorescence channels or represent 3D hollow structures. Our computational pipeline consists of several independent modules: spectral filtering, dual-view registration, local contrast enhancement, 3D nuclei segmentation, image normalization based on a ubiquitous marker, and multiscale analysis of gene expression and morphometrics.

      Spectral filtering has already been applied in other systems (e.g. [7] and [8]), but is here extended to account for imaging depth-dependent apparent emission spectra of the different fluorophores. In our pipeline, we provide code to run spectral filtering on multichannel images, integrated in Python. In order to apply the spectral filtering algorithm utilized here, spectral patterns of each fluorophore need to be calibrated as a function of imaging depth, which depend on the specific emission windows and detector settings of the microscope.

      Image normalization using a wavelength-dependent correction also requires calibration on a given imaging setup to measure the difference in signal decay among the different fluorophores species. To our knowledge, the calibration procedures for spectral-filtering and our image-normalization approach have not been performed previously in 3D samples, which is why validation on published datasets is not readily possible. Nevertheless, they are described in detail in the Methods section, and the code used—from the calibration measurements to the corrected images—is available open-source at the Zenodo link in the manuscript.

      Dual-view registration, local contrast enhancement, and multiscale analysis of gene expression and morphometrics are not limited to organoid data or our specific imaging modalities. If we identify suitable datasets to validate these modules, we will include them in the revised manuscript.

      To evaluate our 3D nuclei segmentation model, we plan to test it on diverse systems, including gastruloids stained with the nuclear marker Draq5 from Moos et al. [1]; breast cancer spheroids; primary ductal adenocarcinoma organoids; human colon organoids and HCT116 monolayers from Ong et al. [2]; and zebrafish tissues imaged by confocal microscopy from Li et al [3]. These datasets were acquired using either light-sheet or confocal microscopy, with varying imaging parameters (e.g., objective lens, pixel size, staining method).

      Preliminary results are promising (see Author response image 1). We will provide quantitative comparisons of our model’s performance on these datasets, using annotations or reference predictions provided by the original authors where available.

      Author response image 1.

      Qualitative comparison of our custom Stardist3D segmentation strategy on diverse published 3D nuclei datasets. We show one slice from the XY plane for simplicity. (a) Gastruloid stained with the nuclear marker DRAQ5 imaged with an open-top dual-view and dual-illumination LSM [1]. (b) Breast cancer spheroid [2]. (c) Primary pancreatic ductal adenocarcinoma organoids imaged with confocal microscopy[2]. (d) Human colon organoid imaged with LSM laser scanning confocal microscope [2]. (e) Monolayer HCT116 cells imaged with LSM laser scanning confocal microscope [2]. (f) Fixed zebrafish embryo stained for nuclei and imaged with a Zeiss LSM 880 confocal microscopy [3].

      Besides, the nuclei segmentation component lacks benchmarking against existing methods.

      We agree with the reviewer that a benchmark against existing segmentation methods would be very useful. We tried different pre-trained models:

      - CellPose, which we tested in a previous paper ([4]) and which showed poor performances compared to our trained StarDist3D model.

      - DeepStar3D ([2]) is only available in the software 3DCellScope. We could not benchmark the model on our data, because the free and accessible version of the software is limited to small datasets. An image of a single whole-mount gastruloid with one channel, having dimensions (347,467,477) was too large to be processed, see screenshot below. The segmentation model could not be extracted from the source code and tested externally because the trained DeepStar3D weights are encrypted.

      Author response image 2.

      Screenshot of the 3DCellScore software. We could not perform 3D nuclei segmentation of a whole-mount gastruloids because the image size was too large to be processed.

      - AnyStar ([5]), which is a model trained from the StarDist3D architecture, was not performing well on our data because of the heterogeneous stainings. Basic pre-processing such as median and gaussian filtering did not improve the results and led to wrong segmentation of touching nuclei. AnyStar was demonstrated to segment well colon organoids in Ong et al, 2025 ([2]), but the nuclei were more homogeneously stained. Our Hoechst staining displays bright chromatin spots that are incorrectly labeled as individual nuclei.

      - Cellos ([6]), another model trained from StarDist3D, was also not performing well. The objects used for training and to validate the results are sparse and not touching, so the predicted segmentation has a lot of false negatives even when lowering the probability threshold to detect more objects. Additionally, the network was trained with an anisotropy of (9,1,1), based on images with low z resolution, so it performed poorly on almost isotropic images. Adapting our images to the network’s anisotropy results in an imprecise segmentation that can not be used to measure 3D nuclei deformations.

      We tried both Cellos and AnyStar predictions on a gastruloid image from Fig. S2 of our main manuscript. Author response image 3 displays the results qualitatively compared to our trained model Stardist-tapenade. For the revision of the paper, we will perform a comprehensive benchmark of these state-of-the-art routines, including quantitative assessment of the performance.

      Author response image 3.

      Qualitative comparison of two published segmentation models versus our model. We show one slice from the XY plane for simplicity. Segmentations are displayed with their contours only. (Top left) Gastruloid stained with Hoechst, image extracted from Fig S2 of our manuscript. (Top right) Same image overlayed with the prediction from the Cellos model, showing many false negatives. (Bottom left) Same image overlayed with the prediction from our Stardist-tapenade model. (Bottom right) Same image overlayed with the prediction from the AnyStar model, false positives are indicated with a red arrow.

      Appraisal:

      The authors set out to establish a quantitative imaging and analysis pipeline for gastruloids using dual-view two-photon microscopy, spectral unmixing, and a custom computational framework for 3D segmentation and gene expression analysis. This aim is largely achieved. The integration of experimental and computational modules enables high-resolution in toto imaging and robust quantitative analysis at the single-cell level. The data presented support the authors' conclusions regarding the ability to capture spatial patterns of gene expression and cellular morphology across developmental stages.

      Impact and utility:

      This work presents a compelling and broadly applicable methodological advance. The approach is particularly impactful for the developmental biology community, as it allows researchers to extract quantitative information from high-resolution images to better understand morphogenetic processes. The data are publicly available on Zenodo, and the software is released on GitHub, making them highly valuable resources for the community.

      We thank the reviewer for these positive feedbacks.

      Reviewer #3 (Public review):

      Summary

      The paper presents an imaging and analysis pipeline for whole-mount gastruloid imaging with two-photon microscopy. The presented pipeline includes spectral unmixing, registration, segmentation, and a wavelength-dependent intensity normalization step, followed by quantitative analysis of spatial gene expression patterns and nuclear morphometry on a tissue level. The utility of the approach is demonstrated by several experimental findings, such as establishing spatial correlations between local nuclear deformation and tissue density changes, as well as the radial distribution pattern of mesoderm markers. The pipeline is distributed as a Python package, notebooks, and multiple napari plugins.

      Strengths

      The paper is well-written with detailed methodological descriptions, which I think would make it a valuable reference for researchers performing similar volumetric tissue imaging experiments (gastruloids/organoids). The pipeline itself addresses many practical challenges, including resolution loss within tissue, registration of large volumes, nuclear segmentation, and intensity normalization. Especially the intensity decay measurements and wavelength-dependent intensity normalization approach using nuclear (Hoechst) signal as reference are very interesting and should be applicable to other imaging contexts. The morphometric analysis is equally well done, with the correlation between nuclear shape deformation and tissue density changes being an interesting finding. The paper is quite thorough in its technical description of the methods (which are a lot), and their experimental validation is appropriate. Finally, the provided code and napari plugins seem to be well done (I installed a selected list of the plugins and they ran without issues) and should be very helpful for the community.

      We thank the reviewer for his positive feedback and appreciation of our work.

      Weaknesses

      I don't see any major weaknesses, and I would only have two issues that I think should be addressed in a revision:

      (1) The demonstration notebooks lack accompanying sample datasets, preventing users from running them immediately and limiting the pipeline's accessibility. I would suggest to include (selective) demo data set that can be used to run the notebooks (e.g. for spectral unmixing) and or provide easily accessible demo input sample data for the napari plugins (I saw that there is some sample data for the processing plugin, so this maybe could already be used for the notebooks?).

      We thank the reviewer for this relevant suggestion. The 7 notebooks were updated to automatically download sample tests. The different parts of the pipeline can now be run immediately: https://github.com/GuignardLab/tapenade/tree/chekcs_on_notebooks/src/tapenade/notebooks

      (2) The results for the morphometric analysis (Figure 4) seem to be only shown in lateral (xy) views without the corresponding axial (z) views. I would suggest adding this to the figure and showing the density/strain/angle distributions for those axial views as well.

      We agree with the reviewer that a morphometric analysis based on the axial views would be informative and plan to perform this analysis for the revision.

      (1) Moos, F., Suppinger, S., de Medeiros, G., Oost, K.C., Boni, A., Rémy, C., Weevers, S.L., Tsiairis, C., Strnad, P. and Liberali, P., 2024. Open-top multisample dual-view light-sheet microscope for live imaging of large multicellular systems. Nature Methods, 21(5), pp.798-803.

      (2) Ong, H.T., Karatas, E., Poquillon, T., Grenci, G., Furlan, A., Dilasser, F., Mohamad Raffi, S.B., Blanc, D., Drimaracci, E., Mikec, D. and Galisot, G., 2025. Digitalized organoids: integrated pipeline for high-speed 3D analysis of organoid structures using multilevel segmentation and cellular topology. Nature Methods, 22(6), pp.1343-1354.

      (3) Li, L., Wu, L., Chen, A., Delp, E.J. and Umulis, D.M., 2023. 3D nuclei segmentation for multi-cellular quantification of zebrafish embryos using NISNet3D. Electronic Imaging, 35, pp.1-9.

      (4) Vanaret, J., Dupuis, V., Lenne, P. F., Richard, F., Tlili, S., & Roudot, P. (2023). A detector-independent quality score for cell segmentation without ground truth in 3D live fluorescence microscopy. IEEE Journal of Selected Topics in Quantum Electronics, 29(4: Biophotonics), 1-12.

      (5) Dey, N., Abulnaga, M., Billot, B., Turk, E. A., Grant, E., Dalca, A. V., & Golland, P. (2024). AnyStar: Domain randomized universal star-convex 3D instance segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 7593-7603).

      (6) Mukashyaka, P., Kumar, P., Mellert, D. J., Nicholas, S., Noorbakhsh, J., Brugiolo, M., ... & Chuang, J. H. (2023). High-throughput deconvolution of 3D organoid dynamics at cellular resolution for cancer pharmacology with Cellos. Nature Communications, 14(1), 8406.

      (7) Rakhymzhan, A., Leben, R., Zimmermann, H., Günther, R., Mex, P., Reismann, D., ... & Niesner, R. A. (2017). Synergistic strategy for multicolor two-photon microscopy: application to the analysis of germinal center reactions in vivo. Scientific reports, 7(1), 7101.

      (8) Dunsing, V., Petrich, A., & Chiantia, S. (2021). Multicolor fluorescence fluctuation spectroscopy in living cells via spectral detection. Elife, 10, e69687.

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript reports the substrate-bound structure of SiaQM from F. nucleatum, which is the membrane component of a Neu5Ac-specific Tripartite ATP-dependent Periplasmic (TRAP) transporter. Until recently, there was no experimentally derived structural information regarding the membrane components of the TRAP transporter, limiting our understanding of the transport mechanism. Since 2022, there have been 3 different studies reporting the structures of the membrane components of Neu5Ac-specific TRAP transporters. While it was possible to narrow down the binding site location by comparing the structures to proteins of the same fold, a structure with substrate bound has been missing. In this work, the authors report the Na+-bound state and the Na+ plus Neu5Ac state of FnSiaQM, revealing information regarding substrate coordination. In previous studies, 2 Na+ ion sites were identified. Here, the authors also tentatively assign a 3rd Na+ site. The authors reconstitute the transporter to assess the effects of mutating the binding site residues they identified in their structures. Of the 2 positions tested, only one of them appears to be critical to substrate binding.

      Strengths:

      The main strength of this work is the capture of the substrate-bound state of SiaQM, which provides insight into an important part of the transport cycle.

      Weaknesses:

      The main weakness is the lack of experimental validation of the structural findings. The authors identified the Neu5Ac binding site, but only tested 2 residues for their involvement in substrate interactions, which was very limited. The authors tentatively identified a 3rd Na+ binding site, which if true would be an impactful finding, but this site was not tested for its contribution to Na+ dependent transport, and the authors themselves report that the structural evidence is not wholly convincing. This lack of experimental validation undermines the confidence of the findings. However, the reporting of these new data is important as it will facilitate follow-up studies by the authors or other researchers.

      The main concern, also mentioned by other reviewers, is the lack of mutational data and functional studies on the identified binding sites. Two other structures of TRAP transporters have been determined, one from Haemophilus influenzae (Hi) and the other from Photobacterium profundum (Pp). We will refer to the references in this paper as [1], Peter et al. as [2], and Davies et al. as [3]. The table below lists all the mutations made in the Neu5Ac binding site, including direct polar interactions between Neu5Ac and the side chains, as well as the newly identified metal sites.

      The structure of Fusobacterium nucleatum (Fn) that we have reported shows a significant sequence identity with the previously reported Hi structure. When we superimpose the Pp and Fn structures, we observe that nearly all the residues that bind to the Neu5Ac and the third metal site are conserved. This suggests that mutagenesis and functional studies from other research can be related to the structure presented in our work.

      The table below shows that all three residues that directly interact with Neu5Ac have been tested by site-directed mutagenesis for their role in Neu5Ac transport. Both D521 and S300 are critical for transport, while S345 is not. We do not believe that a mutation of D521A in Fn, followed by transport studies, will provide any new information.

      However, Peter et al. have mutated only one of the 5 residues near the newly identified metal binding site, which resulted in no transport. The rest of the residues have not been functionally tested. We propose to mutate these residues into Ala, express and purify the proteins, and then carry out transport assays on those that show expression. We will include this information in the revised manuscript.

      Reviewer #2 (Public Review):

      In this exciting new paper from the Ramaswamy group at Purdue, the authors provide a new structure of the membrane domains of a tripartite ATP-independent periplasmic (TRAP) transporter for the important sugar acid, N-acetylneuraminic acid or sialic acid (Neu5Ac). While there have been a number of other structures in the last couple of years (the first for any TRAP-T) this is the first to trap the structure with Neu5Ac bound to the membrane domains. This is an important breakthrough as in this system the ligand is delivered by a substrate-binding protein (SBP), in this case, called SiaP, where Neu5Ac binding is well studied but the 'hand over' to the membrane component is not clear. The structure of the membrane domains, SiaQM, revealed strong similarities to other SBP-independent Na+-dependent carriers that use an elevator mechanism and have defined Na+ and ligand binding sites. Here they solve the cryo-EM structure of the protein from the bacterial oral pathogen Fusobacterium nucleatum and identify a potential third (and theoretically predicted) Na+ binding site but also locate for the first time the Neu5Ac binding site. While this sits in a region of the protein that one might expect it to sit, based on comparison to other transporters like VcINDY, it provides the first molecular details of the binding site architecture and identifies a key role for Ser300 in the transport process, which their structure suggests coordinates the carboxylate group of Neu5Ac. The work also uses biochemical methods to confirm the transporter from F. nucleatum is active and similar to those used by selected other human and animal pathogens and now provides a framework for the design of inhibitors of these systems.

      The strengths of the paper lie in the locating of Neu5Ac bound to SiaQM, providing important new information on how TRAP transporters function. The complementary biochemical analysis also confirms that this is not an atypical system and that the results are likely true for all sialic acid-specific TRAP systems.

      The main weakness is the lack of follow-up on the identified binding site in terms of structure-function analysis. While Ser300 is shown to be important, only one other residue is mutated and a much more extensive analysis of the newly identified binding site would have been useful.

      Please see the comments above.

      Reviewer #3 (Public Review):

      The manuscript by Goyal et al reports substrate-bound and substrate-free structures of a tripartite ATP-independent periplasmic (TRAP) transporter from a previously uncharacterized homolog, F. nucleatum. This is one of the most mechanistically fascinating transporter families, by means of its QM domain (the domain reported in his manuscript) operating as a monomeric 'elevator', and its P domain functioning as a substrate-binding 'operator' that is required to deliver the substrate to the QM domain; together, this is termed an 'elevator with an operator' mechanism. Remarkably, previous structures had not demonstrated the substrate Neu5Ac bound. In addition, they confirm the previously reported Na+ binding sites and report a new metal binding site in the transporter, which seems to be mechanistically relevant. Finally, they mutate the substrate binding site and use proteoliposomal uptake assays to show the mechanistic relevance of the proposed substrate binding residues.

      The structures are of good quality, the functional data is robust, the text is well-written, and the authors are appropriately careful with their interpretations. Determination of a substrate-bound structure is an important achievement and fills an important gap in the 'elevator with an operator' mechanism. Nevertheless, I have concerns with the data presentation, which in its current state does not intuitively demonstrate the discussed findings. Furthermore, the structural analysis appears limited, and even slight improvements in data processing and resulting resolution would greatly improve the authors' claims. I have several suggestions to hopefully improve the clarity and quality of the manuscript.

      We appreciate your feedback and will make the necessary modifications to the manuscript incorporating most of the suggestions. We will submit the revised version once the experiments are completed. We are also working on improving the quality of the figures and have made several attempts to enhance the resolution using CryoSPARC or RELION, but without success. We will continue to explore newer methods in an effort to achieve higher resolution and to model more lipids, particularly in the binding pocket.

    1. Reviewer #3 (Public Review):

      The manuscript presents an intriguing explanation for why grid cell firing fields do {\em not} lie on a lattice whose axes aligned to the walls of a square arena. This observation, by itself, merits the manuscript's dissemination to the journals audience.

      The presentation is quirky (but keep the quirkiness!).

      But let me recast the problem presented by the authors as one of combinatorics. Given repeating, spatially separated firing fields across cells, one obtains temporal sequences of grid cells firing. Label these cells by integers from $[n]$. Any two cells firing in succession should uniquely identify one of six directions (from the hexagonal lattice) in which the agent is currently moving.

      Now, take the symmetric group $\Sigma$ of cyclic permutations on $n$ elements.<br /> We ask whether there are cyclic permutations of $[n]$ such that

      So, for instance, $(4,2,3,1)$ would not be counted as a valid permutation of $(1,2,3,4)$, as $(2,3)$ and $(1,4)$ are adjacent.

      Furthermore, given $[n]$, are there two distinct cyclic permutations such that {\em no} adjacencies are preserved when considering any pair of permutations (among the triple of the original ordered sequence and the two permutations)? In other words, if we consider the permutation required to take the first permutation into the second, that permutation should not preserve any adjacencies.

      {\bf Key question}: is there any difference between the solution to the combinatorics problem sketched above and the result in the manuscript? Specifically, the text argues that for $n=7$ there is only {\em one} solution.

      Ideally, one would strive to obtain a closed-form solution for the number of such permutations as a function of $n$.

    1. Author Response

      Joint Public Review

      Strengths

      Overall, the idea that the PAG interacts with the BLA via the midline thalamus during a predator vs. foraging test is new and quite interesting. The authors have used appropriate tools to address their questions. The major impact in the field would be to add evidence to claims that the BLA can be downstream of the dPAG to evoke defensive behaviors. The study also adds to a body of evidence that the PAG mediates primal fear responses.

      Weaknesses

      (Anatomical concerns)

      1) The authors claim that the recordings were performed in the dorsal PAG (dPAG), but the histological images in Fig. 1B and Supplementary S2 for example show the tip of the electrode in a different subregion of PAG (ventral/lateral). They should perform a more careful histological analysis of the recording sites and explain the histological inclusion and exclusion criteria. Diagrams showing the sites of all PAG and BLA recordings, as well as all fiber optics, would be helpful.

      The PAG is composed of dorsomedial (dm), dorsolateral (dl), lateral (l), and ventrolateral (vl) columns that extend along the rostro-caudal axis of the aqueduct. The term “dorsal PAG” (dPAG) generally encompasses dmPAG, dlPAG, and lPAG, as substantiated by track-tracing, neurochemical, and immunohistochemical techniques (e.g., Bandler et al., 1991; Bandler & Keay, 1996; Carrive, 1993). As Bandler and Shipley (1994) summarized, “These findings suggest that what has been traditionally called the 'dorsal PAG' (a collective term for regions dorsal and lateral to the aqueduct), consists of three anatomically distinct longitudinal columns: dorsomedial and lateral columns…and a dorsolateral column…" Similarly, Schenberg et al. (2005) clarified in their review that, “According to this parcellation...the defensive behaviors (freezing, flight or fight) and aversion-related responses (switchoff behavior) were ascribed to the DMPAG, DLPAG, and LPAG (usually named the ‘dorsal’ PAG).” In our study, all recordings were conducted within the dPAG. Also, Figures 1B and S2 in our manuscript correspond to the -6.04 mm template from Paxinos & Watson’s atlas (1998), which is shown in the left panel in Author response image 1 and is considerably anterior to the location where the vlPAG emerges, as shown in the right panel. In our revised manuscript, we will provide a detailed definition of the dPAG, inclusive of dmPAG, dlPAG, and lPAG, and support this with the referenced literature.

      Author response image 1.

      2) Prior studies investigating the role of BLA neurons during a foraging vs. robot test similar to the one used in this study should be also cited and discussed (e.g., Amir et al 2019; Amir et al 2015). These two studies demonstrated that most neurons in the basal portion of the BLA exhibit inhibitory activity during foraging behavior and only a small fraction of neurons (~4%) display excitatory activity in response to the robot (in contrast to the 25% reported in the present study). A very accurate histological analysis of BLA recording sites should be performed to clarify whether distinct subregions of the BLA encode foraging and predator-related information, as previously shown in the two described studies.

      In the revised manuscript, we will discuss papers by Amir et al. (2015) and Amir et al. (2019) that utilized a similar 'approach food-avoid predator' paradigm. These studies found a correlation between the neuronal activities in the basolateral amygdala (BL) and the velocity of animal movement during foraging, regardless of the presence or absence of predators. Specifically, the majority of BL neurons were inhibited in both conditions, with only 4.5% being responsive to predators. Consequently, Amir et al. posited that amygdala activity predominantly aligns with behavioral output such as foraging, rather than with responses to threats.

      In contrast, our body of work (Kim et al., 2018; Kong et al., 2021; the present study) reveals that the majority of neurons in the BA/BLA displayed distinct responses in pre-robot and robot sessions. Kong et al. (2021) discussed in depth several factors that may account for this discrepancy, given that both Amir et al. and our research used similar behavioral paradigms. Differences in apparatus features, experimental procedures, and data analysis methodologies (refer to Amir et al., 2019) could be contributing to the conflicting results and interpretations concerning the significance of amygdalar neuronal activities.

      Additionally, our studies uniquely monitored the same set of amygdalar neurons during pre-robot and robot sessions, affording us the opportunity for a direct comparison of neuronal activities under different threat conditions.

      Another salient difference lines in the foraging success rates, which were markedly higher in Amir et al (~80%) compared to our studies (<3-4%). We hypothesize that there may be an inverse relationship between the pellet procurement rate and the intensity of fear. The high foraging success rate in Amir et al., which correlates with subdued amygdalar activity, stands in contrast to our findings of heightened amygdalar activity associated with a lower foraging success rate. Supporting this notion, optogeneticallyinduced amygdalar activity led naïve rats to abandon foraging and escape to the nest (Kong et al., 2021, the present study).

      3) An important claim of this study that the PAG sends predator-related signals to BLA via the PVT (Fig. 4). The authors stated that PVT neurons labeled by intra-BLA injection of the retrograde tracer CTB were activated by the predator, but a proper immunohistochemical quantification with a control group was not provided to support this claim. To provide better support for their claim, the authors should quantify the doublelabeled PVT neurons (cFos plus CTB positive neurons) during the robot test.

      As recommended, we will include a revised Fig. 4 in the manuscript to present the quantification of neurons that are double-labeled with c-Fos and CTB in the PVT. This updated figure will provide a more rigorous analysis and visual representation of the data.

      4) The AVV anterograde tracer deposit spread to a large part of the PAG, including dorsolateral and lateral PAG, and supraoculomotor regions (Fig. 4B). Is the projection to the PVT from the dPAG or other regions of the PAG?

      As previously addressed in response to Comment #1, the dPAG comprises the dmPAG, dlPAG, and lPAG. In the revised manuscript, we will acknowledge the diffusion of the AAV to the adjacent deep gray layer of the superior colliculus. Additionally, we are considering conducting more restricted AAV injections into the dPAG to verify terminal expressions in the PVT.

      (Concerns about the strength of the evidence supporting a role for the PVT)

      5) The authors conclude in the discussion section that the dPAG-amygdala pathway is involved in generating antipredatory defensive behavior. However, the current results are entirely based on correlational analyses of neural firing rate and there is no direct demonstration that the PAG provides information about the robot to the BLA. Therefore, the authors should tone down their interpretation or provide more evidence to support it by performing experiments applying inhibitory tools in the dPAG > PVT > BLA pathway and examining the impact on behavior and downstream neural firing.

      As suggested, we will moderate the assertions about the functional implications of the PVT, based on the data from anterograde and retrograde tracers, to present a more measured interpretation in the manuscript.

      (Other concerns)

      6) One of the main findings of this study is the observation that BLA neurons that are responsive to PAG photostimulation are preferentially recruited during the foraging vs. robot test (Fig. 3). However, the experimental design used to address this question is problematic because the laser photostimulation of PAG neurons preceded the foraging vs. robot test. Prior photoactivation of PAG may have caused indirect shortterm synaptic plasticity in BLA cells, which would favor the response of these cells to the robot. Please see Oishi et al, 2019 PMID: 30621738, which demonstrated that 10 trains of 20Hz photoactivation (300 pulses each) was sufficient to induce LTP in brain slices.

      After approximately eight photostimulation trials of the dPAG, with 40 pulses each, the animals entered a post-photostimulation testing phase (referred to as "Post"; Fig. 3C), lasting 10-15 minutes over an average of eight trials before robot testing. Although the PAG does not directly project to the BLA, the remote possibility of trans-synaptic plasticity in the BLA cannot be completely excluded and will be acknowledged. Additionally, it is noteworthy that Oishi et al's (2019) study applied a total of 3,000 pulses (i.e., 10 15-s trains of 20-Hz pulses) and investigated CA3-CA3 synaptic plasticity, as opposed to a total of 320 pulses (i.e., 8 2-s trains of 20-Hz pulses) in our study.

      7) The authors should perform a longitudinal analysis of the behavioral responses of the rats across the trials to clarify whether the animals habituate to the robot or not. In Figure 1E, it appears that PAG neurons fire less across the trials, which could be associated with behavioral habituation to the predator robot. If that is the case, the activity of many other PAG and BLA neurons will also most likely vary according to the trial number, which would impact the current interpretation of the results.

      In Figure 1E, the y-axis represents the Z scores of individual dPAG neurons, instead of representing repeated tests of the same neuron across multiple trials. The raster plot in Figure 1F clearly depicts that the same dPAG neurons consistently display heightened neural activity in response to the approaching robot across successive trials.

      8) In Figure 1, it is unclear why the authors compared the activity of neurons that respond to the robot activation against the activity of the neurons during the retrieval of the food pellets in the pre-robot and postrobot sessions. The best comparison would be aligning the cells that were responsive to the activation of the robot with the moment in which the animals run back to the nest after consuming the pellets during the prerobot or post-robot sessions. This would enable the authors to demonstrate that the PAG responses are directly associated with the expression of escaping behavior in the presence of the robot rather than associated with the onset of goal-directed movement in direction to the next during the pre- and post-robot sessions. A graphic showing the correlation between PAG firing rate and escape response would be also informative.

      Figure 1E compares the dPAG neural activity when animals enter a designated pellet zone (time-stamped by camera tracking) during both pre-robot and post-robot trials to the dPAG neural activity when entering the robot trigger zone (time-stamped by robot activation). We wish to clarify that rats carry the large (0.5 g) pellet back to the nest for consumption rather than consume it in the open arena before returning to the nest.

      In our study, we aimed to investigate the direct response of dPAG neurons to the looming predator and explore the communication between dPAG and BLA in relation to antipredatory defensive responses. To build upon our previous research that suggests a potential role of dPAG in conveying such responses to the BLA (Kim et al., 2013) and the immediate firing of BLA neurons in response to predatory threats (Kim et al., 2018; Kong et al., 2021), we chose to narrow our testing window to a short latency period (< 500 ms) following robot activations. This specific time window allowed us to focus on the initial stages of the threat stimulus processing and minimize potential confounding factors such as the presence of residual firing activity triggered by the robot during the animals’ escape or any activity changes induced by the animals' behavior.

      Furthermore, Figure S1C clearly demonstrates that (i) increased activity of dPAG robot cells preceded the animals’ actual turning and fleeing behavior toward the nest, as indicated by the peak values of movement speed (dark yellow), and (ii) the presence of pellets did not affect activity changes of the robot cells during pre- and post-robot sessions. These observations suggest that the heightened activity of dPAG robot cells was not due to movement changes or pellet motivation.

      Lastly, as stated in the original manuscript, the vast majority of robot cells (90.9%) did not show significant correlations between movement speed and firing rates, lending further support to the interpretation that the dPAG activity observed was not merely a reflection of movement changes.

      References

      Bandler, R., Carrive, P., & Depaulis, A. (1991). Emerging principles of organization of the midbrain periaqueductal gray matter. The midbrain periaqueductal gray matter: functional, anatomical, and neurochemical organization, 1-8.

      Bandler, R. & Keay, K. A. (1996). Columnar organization in the midbrain periaqueductal gray and the integration of emotional expression. Progress in brain research, 107, 285-300.

      Bandler, R. & Shipley, M. T. (1994) Columnar organization in the midbrain periaqueductal gray: modules for emotional expression? Trends in Neurosciences, 17(9), 379-89.

      Carrive, P. (1993). The periaqueductal gray and defensive behavior: functional representation and neuronal organization. Behavioural brain research, 58(1-2), 27-47.

      Oishi, N., Nomoto, M., Ohkawa, N., Saitoh, Y., Sano, Y., Tsujimura, S., ... & Inokuchi, K. (2019). Artificial association of memory events by optogenetic stimulation of hippocampal CA3 cell ensembles. Molecular brain, 12, 1-10.

      Paxinos, G. & Watson, C. (1998). The Rat Brain in Stereotaxic Coordinates. Academic Press, San Diego. Schenberg, L. C., Póvoa, R. M. F., Costa, A. L. P., Caldellas, A. V., Tufik, S., & Bittencourt, A. S. (2005). Functional specializations within the tectum defense systems of the rat. Neuroscience & Biobehavioral Reviews, 29(8), 1279-1298.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work by Ding et al uses agent-based simulations to explore the role of the structure of molecular motor myosin filaments in force generation in cytoskeletal structures. The focus of the study is on disordered actin bundles which can occur in the cell cytoskeleton and have also been investigated with in vitro purified protein experiments.

      Strengths:

      The key finding is that cooperative effects between multiple myosin filaments can enhance both total force and the efficiency of force generation (force per myosin). These trends were possible to obtain only because the detailed structure of the motor filaments with multiple heads is represented in the model.

      We appreciate your comments about the strength of our study.

      Weaknesses:

      It is not clearly described what scientific/biological questions about cellular force production the work answers. There should be more discussion of how their simulation results compare with existing experiments or can be tested in future experiments.

      Thank you for the comment. First, our study explains why non-muscle myosin II in stress fibers shows focal distributions rather than uniform distributions; if they stay closely, they can generate much larger forces in the stress fibers via the cooperative overlap. Our study also predicts a difference between bipolar structures (found in skeletal muscle myosins and non-muscle myosins) and side polar structures (found in smooth muscle myosins) in terms of the likelihood of the cooperative overlap. As shown below, myosin filaments with the bipolar structure can add up their forces better than those with the side polar structure when their overlap level is the same. We will add discussion about these in the revised manuscript.

      Author response image 1.

      As the reviewer noticed, our results were briefly compared with prior observations in Ref. 4 (Thoresen et al., Biophys J, 2013) where different myosin isoforms were used for in vitro actin bundles. We will add more quantitative comparisons between the in vitro study and our results.

      In addition, at the end of the conclusion section, we suggested future experiments that can be used for verifying our results. In particular, experiments with synthetic myosin filaments with tunable geometry seem to be suitable for verifying our computational predictions and observations.

      The model assumptions and scientific context need to be described better.

      We apologize for the insufficient descriptions about the model. We will revise those parts to better explain model assumptions and scientific context.

      The network contractility seems to be a mere appendix to the bundle contractility which is presented in much more detail.

      We included some cases run with the two-dimensional network in this study to prove the generality of our conclusions. We included minimal preliminary results in this study because we are currently working on a follow-up study with network structures. I hope that the reviewer would understand our intention and situation.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors use a mechanical model to investigate how the geometry and deformations of myosin II filaments influence their force generation. They introduce a force generation efficiency that is defined as the ratio of the total generated force and the maximal force that the motors can generate. By changing the architecture of the myosin II filaments, they study the force generation efficiency in different systems: two filaments, a disorganized bundle, and a 2D network. In the simple two-filament systems, they found that in the presence of actin cross-linking proteins motors cannot add up their force because of steric hindrances. In the disorganized bundle, the authors identified a critical overlap of motors for cooperative force generation. This overlap is also influenced by the arrangement of the motor on the filaments and influenced by the length of the bare zone between the motor heads.

      Strengths:

      The strength of the study is the identification of organizational principles in myosin II filaments that influence force generation. It provides a complementary mechanistic perspective on the operation of these motor filaments. The force generation efficiency and the cooperative overlap number are quantitative ways to characterize the force generation of molecular motors in clusters and between filaments. These quantities and their conceptual implications are most likely also applicable in other systems.

      Thank you for the comments about the strength of our study.

      Weaknesses:

      The detailed model that the authors present relies on over 20 numerical parameters that are listed in the supplement. Because of this vast amount of parameters, it is not clear how general the findings are. On the other hand, it was not obvious how specific the model is to myosin II, meaning how well it can describe experimental findings or make measurable predictions. The model seems to be quantitative, but the interpretation and connection to real experiments are rather qualitative in my point of view.

      As the reviewer mentioned, all agent-based computational models for simulating the actin cytoskeleton are inevitably involved with such a large number of parameters. Some of the parameter values are not known well, so we have tuned our parameter values carefully by comparing our results with experimental observations in our previous studies since 2009. 

      We were aware of the importance of rigorous representation of unbinding and walking rates of myosin motors, so we implemented the parallel cluster model, which can predict those rates with consideration of the mechanochemical rates of myosin II, into our model. Thus, we are convincing that our motors represent myosin II.

      In our manuscript, our results were compared with prior observations in Ref. 4 (Thoresen et al., Biophys J, 2013) several times. In particular, larger force generation with more myosin heads per thick filament was consistent between the experiment and our simulations.

      Our study can make various predictions. First, our study explains why non-muscle myosin II in stress fibers shows focal distributions rather than uniform distributions; if they stay closely, they can generate much larger forces in the stress fibers via the cooperative overlap. Our study also predicts a difference between bipolar structures (found in skeletal muscle myosins and non-muscle myosins) and side polar structures (found in smooth muscle myosins) in terms of the likelihood of the cooperative overlap. As shown in Author response image 1, myosin filaments with the bipolar structure can add up their forces better than those with the side polar structure when their overlap level is the same. We will add discussion about these in the revised manuscript.

      We will add more discussion about these in the revised manuscript.

      It was often difficult for me to follow what parameters were changed and what parameters were set to what numerical values when inspecting the curve shown in the figures. The manuscript could be more specific by explicitly giving numbers. For example, in the caption for Figure 6, instead of saying "is varied by changing the number of motor arms, the bare zone length, the spacing between motor arms", the authors could be more specific and give the ranges: ""is varied by changing the number of motor arms form ... to .., the bare zone length from .. to..., and the spacing between motor arms from .. to ..".

      This unspecificity is also reflected in the text: "We ran simulations with a variation in either L<sub>sp</sub> or L<sub>bz</sub>" What is the range of this variation? "When L<sub>M</sub> was similar" similar to what? "despite different N<sub>M</sub>." What are the different values for N<sub>M</sub>? These are only a few examples that show that the text could be way more specific and quantitative instead of qualitative descriptions.

      We appreciate the comment. We will specify the range of the variation in each parameter in the revised manuscript.

      In the text, after equation (2) the authors discuss assumptions about the binding of the motor to the actin filament. I think these model-related assumptions and explanations should be discussed not in the results section but rather in the "model overview" section.

      Thank you for pointing this out. We will reorganize the text in the revised manuscript.

      The lines with different colors in Figure 2A are not explained. What systems and parameters do they represent?

      The different colors used in Fig. 2A were used for distinguishing 20 cases. We will add explanation about the colors in the figure caption in the revised manuscript.

    1. Author response:

      We thank the reviewers for their support of this work and insightful recommendations for how to improve it. We have provided specific responses to each reviewer comment below. To summarize how we intend to address the requested revisions:

      Many of the reviewers’ comments requested additional technical or quality details about the DMS libraries or assays (e.g., number of cells tested, number of sequencing reads, assay replication, assay sensitivity, library balance), and we provide additional information and analyses that we can incorporate into the relevant portions of the text, supplementary tables, and supplementary figures to address these questions.

      Some comments asked to clarify nomenclature/wording or provide additional labels to images, and we will make these changes as requested.

      A few questions would require additional experimental data to address. Where experiments have already been performed, we will incorporate those results or cite relevant work previously reported in the literature.

      Reviewer 1:

      Summary

      Howard et al. performed deep mutational scanning on the MC4R gene, using a reporter assay to investigate two distinct downstream pathways across multiple experimental conditions. They validated their findings with ClinVar data and previous studies. Additionally, they provided insights into the application of DMS results for personalized drug therapy and differential ligand responses across variant types.

      Strengths

      They captured over 99% of variants with robust signals and investigated subtle functionalities, such as pathway-specific activities and interactions with different ligands, by refining both the experimental design and analytical methods.

      Weaknesses

      While the study generated informative results, it lacks a detailed explanation regarding the input library, replicate correlation, and sequencing depth for a given number of cells.

      Additionally, there are several questions that it would be helpful for authors to clarify.

      (1) It would be helpful to clarify the information regarding the quality of the input library and experimental replicates. Are variants evenly represented in the library? Additionally, have the authors considered using long-read sequencing to confirm the presence of a single intended variant per construct? Finally, could the authors provide details on the correlation between experimental replicates under each condition?

      Are variants evenly represented in the library?

      We strive to achieve as evenly balanced library as possible at every stage of the DMS process (e.g., initial cloning in E. coli through integration into human cells). Below is a representative plot showing the number of barcodes per amino acid variant at each position in a given ~60 amino acid subregion of MC4R, which highlights how evenly variants are represented at the E. coli cloning stage.

      Author response image 1.

      We also make similar measurements after the library is integrated into HEK293T cell lines, and see similarly even coverage across all variants, as shown in the plot below.

      Author response image 2.

      Additionally, have the authors considered using long-read sequencing to confirm the presence of a single intended variant per construct?

      We agree long-read sequencing would be an excellent way to confirm that our constructs contain a single intended variant. However, we elected for an alternate method (outlined in more detail in Jones et al. 2020) that leverages multiple layers of validation. First, the oligo chip-synthesized portions of the protein containing the variants are cloned into a sequence-verified plasmid backbone, which greatly decreases the chances of spuriously generating a mutation in a different portion of the protein. We then sequence both the oligo portion and random barcode using overlapping paired end reads during barcode mapping to avoid sequencing errors and to help detect DNA synthesis errors. At this stage, we computationally reject any constructs that have more than one variant. Given this, the vast majority of remaining unintended variants would come from somatic mutations introduced by the E. coli cloning or replication process, which should be low frequency. We have used our in-house full plasmid sequencing method, OCTOPUS, to sample and spot check this for several other DMS libraries we have generated using the same cloning methods. We have found variants in the plasmid backbone in only ~1% of plasmids in these libraries. Our statistical model also helps correct for this by accounting for barcode-specific variation. Finally we believe this provides further motivation for having multiple barcodes per variant, which dilutes the effect of any unintended additional variants.

      Finally, could the authors provide details on the correlation between experimental replicates under each condition?

      Certainly! In general, the Gs reporter had higher correlation between replicates than the Gq system (r ~ 0.5 vs r ~ 0.4). The plots below show two representative correlations at the RNA-seq stage of read counts for barcodes between the low a-MSH conditions. One important advantage of our statistical model is that it’s able to leverage information from barcodes regardless of the number of replicates they appear in.

      Author response image 3.

      Since the functional readout of variants is conducted through RNA sequencing, it seems crucial to sequence a sufficient number of cells with adequate sequencing saturation. Could the authors clarify the coverage depth used for each RNA-seq experiment and how this depth was determined? Additionally, how many cells were sequenced in each experiment?

      This will be addressed by incorporating the following details into the manuscript:

      We seeded 17 million cells per replicate at the start of each assay and, with a doubling of ~1.5x over the course of the assay, harvested ~25.5 million cells per replicate for RNA extraction and sequencing. We found this sufficient to get at least ~30-60x cellular coverage per amino acid variant.

      Total mapped reads per replicate at RNA-seq stage

      - Gs/CRE: 9.1-18.2 million mapped reads, median=12.3

      - Gq/UAS: 8.6-24.1 million mapped reads, median=14.5

      - Gs/CRE+Chaperone: 6.4-9.5 million mapped reads, median=7.5

      Reads per barcode distribution

      - Median read counts of 8, 10, and 6 reads per sample per barcode for Gs/CRE, Gq/UAS, and Gs/CRE+Chaperone assays, respectively.

      Barcodes per variant distribution

      - As reported, the median number of barcodes per variant across samples (the “median of medians”) is 56 for Gs/CRE and 28 for Gq/UAS

      - Additionally, it is 44 for Gs/CRE+Chaperone

      It appears that the frequencies of individual RNA-seq barcode variants were used as a proxy for MR4C activity. Would it be important to also normalize for heterogeneity in RNA-seq coverage across different cells in the experiment? Variability in cell representation (i.e., the distribution of variants across cells) could lead to misinterpretation of variant effects. For example, suppose barcode_a1 represents variant A and barcode_b1 represents variant B. If the RNA-seq results show 6 reads for barcode_a1 and 7 reads for barcode_b1, it might initially appear that both variants have similar effect sizes. However, if these reads correspond to 6 separate cells each containing 1 copy of barcode_a1, and only 1 cell containing 7 copies of barcode_b1, the interpretation changes significantly. Additionally, if certain variants occupy a larger proportion of the cell population, they are more likely to be overrepresented in RNA sequencing.

      We account for this heterogeneity in several ways. First, as shown above (Response to Reviewer 1, Question 1), we aim to have even representation of variants within our libraries. Second, we utilize compositional control conditions like forskolin or unstimulated conditions to obtain treatment-independent measurements of barcode abundance and, consequently, of mutant-vs-WT effects that are due to compositional rather than biological variability. We expect that variability observed under these controls is due to subtle effects of molecular cloning, gene expression, and stochasticity. Using these controls, we observe that mutant-vs-WT effects are generally close to zero in these normalization conditions (e.g., in untreated Gq, see Supplementary Figure 3) as compared to drug-treated conditions. For example, pre-mature stops behave similar to WT in normalization conditions. This indicates that mutant abundance is relatively homogenous. Where there are barcode-dependent effects on abundance, we can use information from these conditions to normalize that effect. Finally, our mixed-effect model accounts for barcode-specific deviations from the expected mutant effect (e.g. a “high count” barcode consistently being high relative to the mean).

      Although the assay system appears to effectively represent MC4R functionality at the molecular level, we are curious about the potential disparity between the DMS score system and physiological relevance. How do variants reported in gnomAD distribute within the DMS scoring system?

      Figure 2D shows DMS scores (variant effect on Gs signaling) relative to human population frequency for all MC4R variants reported in gnomAD as of January 8, 2024.

      To measure Gq signaling, the authors used the GAL4-VPR relay system. Is there additional experimental data to support that this relay system accurately represents Gq signaling?

      The full Gq reporter uses an NFAT response element from the IL-2 promoter to regulate the expression of the GAL4-VPR relay. In this system, the activation of Gq signaling results in the activation of the NFAT response element, and this signal is then amplified by the GAL4-VPR relay. The NFAT response element has been previously well-validated to respond to the activation of Gq signaling (e.g., PMID: 8631834). We will add this reference to the text to further support the use of the Gq assay.

      Identifying the variants responsive to the corrector was impressive. However, we are curious about how the authors confirmed that the restoration of MC4R activity was due to the correction of the MC4R protein itself. Is there a possibility that the observed effect could be influenced by other factors affected by the corrector? When the corrector was applied to the cells, were any expected or unexpected differential gene expression changes observed?

      While we do not directly measure whether Ipsen-17 has effects on other signaling processes, previous work has shown that Ipsen-17 treatment does not indirectly alter signaling kinetics such as receptor internalization (Wang et al., 2014). Furthermore, our analysis methods inherently account for this by normalizing variant effects to WT signaling levels. Any observed rescue of a given variant inherently means that the variant is specifically more responsive to Ipsen-17 than WT, and the fact that different variants exhibit different levels of rescue is reassuring that the mechanism is on target to MC4R. Lastly, Ipsen-17 is known to be an antagonist of alpha-MSH activity and is thought to bind directly to the same site on MC4R (Wang et al., 2014).

      As mentioned in the introduction, gain-of-function (GoF) variants are known to be protective against obesity. It would be interesting to see further studies on the observed GoF variants. Do the authors have any plans for additional research on these variants?

      We agree this would be an excellent line of inquiry, but due to changes in company priorities we unfortunately do not have any plans for additional research on these variants.

      Reviewer 2:

      Overview

      In this manuscript, the authors use deep mutational scanning to assess the effect of ~6,600 protein-coding variants in MC4R, a G protein-coupled receptor associated with obesity. Reasoning that current deep mutational scanning approaches are insufficiently precise for some drug development applications, they focus on articulating new, more precise approaches. These approaches, which include a new statistical model and innovative reporter assay, enable them to probe molecular phenotypes directly relevant to the development of drugs that target this receptor with high precision and statistical rigor.

      They use the resulting data for a variety of purposes, including probing the relationship between MC4R's sequence and structure, analyzing the effect of clinically important variants, identifying variants that disrupt downstream MC4R signaling via one but not both pathways, identifying loss of function variants are amenable to a corrector drug and exploring how deep mutational scanning data could guide small molecule drug optimization.

      Strengths

      The analysis and statistical framework developed by the authors represent a significant advance. In particular, the study makes use of barcode-level internally replicated measurements to more accurately estimate measurement noise.

      The framework allows variant effects to be compared across experimental conditions, a task that is currently hard to do with rigor. Thus, this framework will be applicable to a large number of existing and future deep mutational scanning experiments.

      The authors refine their existing barcode transcription-based assay for GPCR signaling, and develop a clever "relay" new reporter system to boost signaling in a particular pathway. They show that these reporters can be used to measure both gain of function and loss of function effects, which many deep mutational scanning approaches cannot do.

      The use of systematic approaches to integrate and then interrogate high-dimensional deep mutational scanning data is a big strength. For example, the authors applied PCA to the variant effect results from reporters for two different MC4R signaling pathways and were able to discover variants that biased signaling through one or the other pathway. This approach paves the way for analyses of higher dimensional deep mutational scans.

      The authors use the deep mutational scanning data they collect to map how different variants impact small molecule agonists activate MC4R signaling. This is an exciting idea, because developing small-molecule protein-targeting therapeutics is difficult, and this manuscript suggests a new way to map small-molecule-protein interactions.

      Weaknesses

      The authors derive insights into the relationship between MC4R signaling through different pathways and its structure. While these make sense based on what is already known, the manuscript would be stronger if some of these insights were validated using methods other than deep mutational scanning.

      Likewise, the authors use their data to identify positions where variants disrupt MC4R activation by one small molecule agonist but not another. They hypothesize these effects point to positions that are more or less important for the binding of different small molecule agonists. The manuscript would be stronger if some of these insights were explored further.

      Impact

      In this manuscript, the authors present new methods, including a statistical framework for analyzing deep mutational scanning data that will have a broad impact. They also generate MC4R variant effect data that is of interest to the GPCR community.

    1. Author response:

      Reviewer 1:

      There are no significant weaknesses to signal in the manuscript. However, in order to fully conclude that there is no obvious advantage for the linguistic dimension in neonates, it would have been most useful to test a third condition in which the two dimensions were pitted against each other, that is, in which they provide conflicting information as to the boundaries of the words comprised in the artificial language. This last condition would have allowed us to determine whether statistical learning weighs linguistic and non-linguistic features equally, or whether phonetic content is preferentially processed.

      We appreciate the reviewers' suggestion that a stream with conflicting information would provide valuable insights. In the present study, we started with a simpler case involving two orthogonal features (i.e., phonemes and voices), with one feature being informative and the other uninformative, and we found similar learning capacities for both. Future work should explore whether infants—and humans more broadly—can simultaneously track regularities in multiple speech features. However, creating a stream with two conflicting statistical structures is challenging. To use neural entrainment, the two features must lead to segmentation at different chunk sizes so that their effects lead to changes in power/PLV at different frequencies—for instance, using duplets for the voice dimension and triplets for the linguistic dimension  (or vice versa). Consequently, the two dimensions would not be directly comparable within the same participant in terms of the number of distinguishable syllables/voices, memory demand, or SNR given the 1/F decrease in amplitude of background EEG activity. This would involve comparisons between two distinct groups counter-balancing chunk size and linguistic non-linguistic dimension. Considering the test phase, words for one dimension would have been part-words for the other dimension. As we are measuring differences and not preferences, interpreting the results would also have been difficult. Additionally, it may be difficult to find a sufficient number of clearly discriminable voices for such a design (triplets imply 12 voices). Therefore, an entirely different experimental paradigm would need to be developed.

      If such a design were tested, one possibility is that the regularities for the two dimensions are calculated in parallel, in line with the idea that the calculation of statistical regularities is a ubiquitous implicit mechanism (see Benjamin et al., 2024, for a proposed neural mechanism). Yet, similar to our present study, possibly only phonetic features would be used as word candidates. Another possibility is that only one informative feature would be explicitly processed at a time due to the serial nature of perceptual awareness, which may prioritise one feature over the other.

      Note: The reviewer’s summary contains a typo: syllabic rate (4 Hz) –not 2 Hz, and word rate (2 Hz) –not 4 Hz.

      Reviewer 2:

      N400: I am skeptical regarding the interpretation of the phoneme-specific ERP effect as a precursor of the N400 and would suggest toning it down. While the authors are correct in that infant ERP components are typically slower and more posterior compared to adult components, and the observed pattern is hence consistent with an adult N400, at the same time, it could also be a lot of other things. On a functional level, I can't follow the author's argument as to why a violation in phoneme regularity should elicit an N400, since there is no evidence for any semantic processing involved. In sum, I think there is just not enough evidence from the present paradigm to confidently call it an N400.

      The reviewer is correct that we cannot definitively determine the type of processing reflected by the ERP component that appears when neonates hear a triplet after exposure to a stream with phonetic regularities. We interpreted this component as a precursor to the N400, based on prior findings in speech segmentation tasks without semantic content, where a ~400 ms component emerged when adult participants recognised pseudowords (Sander et al., 2002) or during structured streams of syllables (Cunillera et al., 2006, 2009). Additionally, the component we observed had a similar topography and timing to those labelled as N400 in infant studies, where semantic processing was involved (Parise et al., 2010; Friedrich & Friederici, 2011).

      Given our experimental design, the difference we observed must be related to the type of regularity during familiarisation (either phonemes or voices). Thus, we interpreted this component as reflecting lexical search— a process which could be triggered by a linguistic structure but which would not be relevant to a non-linguistic regularity such as voices. However, we are open to alternative interpretations. In any case, this difference between the two streams reveals that computing regularities based on phonemes versus voices does not lead to the same processes. We will revise and tone down the corresponding part of the discussion to clarify that it is just a possible interpretation of the results.  

      Female and male voices: Why did the authors choose to include male and female voices? While using both female and male stimuli of course leads to a higher generalizability, it also introduces a second dimension for one feature that is not present for this other (i.e., phoneme for Experiment 1 and voice identity plus gender for Experiment 2). Hence, couldn't it also be that the infants extracted the regularity with which one gender voice followed the other? For instance, in List B, in the words, one gender is always followed by the other (M-F or F-M), while in 2/3 of the part-words, the gender is repeated (F-F and M-M). Wouldn't you expect the same pattern of results if infants learned regularities based on gender rather than identity?

      We used three female and three male voices to maximise acoustic variability. The streams were synthesised using MBROLA, which provides a limited set of artificial voices. Indeed, there were not enough French voices of acceptable quality, so we also used two Italian voices (the phonemes used existed in both Italian and French).

      Voices differ in timbre, and female voices tend to be higher pitched. However, it is sometimes difficult to categorise low-pitched female voices and high-pitched male voices. Given that gender may be an important factor in infants' speech perception (newborns, for instance, prefer female voices at birth), we conducted tests to assess whether this dimension could have influenced our results.  

      We first quantified the transitional probabilities matrices during the structured stream of Experiment 2, considering that there are only two types of voices: Female and Male.  

      For List A, all transition probabilities are equal to 0.5 (P(M|F), P(F|M), P(M|M), P(F|F)), resulting in flat TPs throughout the stream (see Author response image 1, top). Therefore, we would not expect neural entrainment at the word rate (2 Hz), nor would we anticipate ERP differences between the presented duplets in the test phase.

      For List B, P(M|F)=P(F|M)=0.66 while P(M|M)=P(F|F)=0.33. However, this does not produce a regular pattern of TP drops throughout the stream (see Author response image 1, bottom). As a result, strong neural entrainment at 2 Hz was unlikely, although some degree of entrainment might have occasionally occurred due to some drops occurring at a 2 Hz frequency. Regarding the test phase, all three Words and only one Part-word presented alternating patterns (TP=0.6). Therefore, the difference in the ERPs between Words and Partwords in List B might be attributed to gender alternation.  

      However, it seems unlikely that gender alternation alone explains the entire pattern of results, as the effect is inconsistent and appears in only one of the lists. To rule out this possibility, we analysed the effects in each list separately.

      Author response image 1.

      Transition probabilities (TPs) across the structured stream in Experiment 2, considering voices processed by gender (Female or Male). Top: List A. Bottom: List B.

      We computed the mean activation within the time windows and electrodes of interest and compared the effects of word type and list using a two-way ANOVA. For the difference between Words and Part-words over the positive cluster, we observed a main effect of word type (F(1,31) = 5.902, p = 0.021), with no effects of list or interactions (p > 0.1). Over the negative cluster, we again observed a main effect of word type (F(1,31) = 10.916, p = 0.0016), with no effects of list or interactions (p > 0.1). See Author response image 2.  

      Author response image 2.

      Difference in ERP voltage (Words – Part-words) for the two lists (A and B); W=Words; P=Part-Words, 

      We conducted a similar analysis for neural entrainment during the structured stream on voices. A comparison of entrainment at 2 Hz between participants who completed List A and List B showed no significant differences (t(30) = -0.27, p = 0.79). A test against zero for each list indicated significant entrainment in both cases (List A: t(17) = 4.44, p = 0.00036; List B: t(13) = 3.16, p = 0.0075). See Author response image 3.

      Author response image 3.

      Neural entrainment at 2Hz during the structured stream of Experiment 2 for Lists A and B.

      Words entrainment over occipital electrodes: Do you have any idea why the duplet entrainment effect occurs over the electrodes it does, in particular over the occipital electrodes (which seems a bit unintuitive given that this is a purely auditory experiment with sleeping neonates).

      Neural entrainment might be considered as a succession of evoked response induced by the stream. After applying an average reference in high-density EEG recordings, the auditory ERP in neonates typically consists of a central positivity and a posterior negativity with a source located at the electrical zero in a single-dipole model (i.e. approximately in the superior temporal region (Dehaene-Lambertz & Dehaene, 1994). In adults, because of the average reference (i.e. the sum of voltages is equal to zero at each time point) and because the electrodes cannot capture the negative pole of the auditory response, the negativity is distributed around the head. In infants, however, the brain is higher within the skull, allowing for a more accurate recording of the negative pole of the auditory ERP (see Author response image 4 for the location of electrodes in an infant head model).  

      Besides the posterior electrodes, we can see some entrainment on more anterior electrodes that probably corresponds to the positive pole of the auditory ERP.

      Author response image 4.

      International 10–20 sensors' location on the skull of an infant template, with the underlying 3-D reconstruction of the grey-white matter interface and projection of each electrode to the cortex. Computed across 16 infants (from Kabdebon et al, Neuroimage, 2014). The O1, O2, T5, and T6 electrodes project lower than in adults.

      Reviewer 3:

      (1) While it's true that voice is not essential for language (i.e., sign languages are implemented over gestures; the use of voices to produce non-linguistic sounds, like laughter), it is a feature of spoken languages. Thus I'm not sure if we can really consider this study as a comparison between linguistic and non-linguistic dimensions. In turn, I'm not sure that these results show that statistical learning at birth operates on non-linguistic features, being voices a linguistic dimension at least in spoken languages. I'd like to hear the authors' opinions on this.

      On one hand, it has been shown that statistical learning (SL) operates across multiple modalities and domains in human adults and animals. On the other hand, SL is considered essential for infants to begin parsing speech. Therefore, we aimed to investigate whether SL capacities at birth are more effective on linguistic dimensions of speech, potentially as a way to promote language learning.

      We agree with the reviewer that voices play an important role in communication (e.g., for identifying who is speaking); however, they do not contribute to language structure or meaning, and listeners are expected to normalize across voices to accurately perceive phonemes and words. Thus, voices are speech features but not linguistic features. Additionally, in natural speech, there are no abrupt voice changes within a word as in our experiment; instead, voice changes typically occur on a longer timescale and involve only a limited number of voices, such as in a dialogue. Therefore, computing regularities based on voice changes would not be useful in real-life language learning. We considered that contrasting syllables and voices was an elegant way to test SL beyond its linguistic dimension, as the experimental paradigm is identical in both experiments.  

      Along the same line, in the Discussion section, the present results are interpreted within a theoretical framework showing statistical learning in auditory non-linguistic (string of tones, music) and visual domains as well as visual and other animal species. I'm not sure if that theoretical framework is the right fit for the present results.

      (2) I'm not sure whether the fact that we see parallel and independent tracking of statistics in the two dimensions of speech at birth indicates that newborns would be able to do so in all the other dimensions of the speech. If so, what other dimensions are the authors referring to?

      The reviewer is correct that demonstrating the universality of SL requires testing additional modalities and acoustic dimensions. However, we postulate that SL is grounded in a basic mechanism of long-term associative learning, as proposed in Benjamin et al. (2024), which relies on a slow decay in the representation of a given event. This simple mechanism, capable of operating on any representational output, accounts for many types of sequence learning reported in the literature (Benjamin et al., in preparation). We will revise the discussion section to clarify this theoretical framework.

      (3) Lines 341-345: Statistical learning is an evolutionary ancient learning mechanism but I do not think that the present results are showing it. This is a study on human neonates and adults, there are no other animal species involved therefore I do not see a connection with the evolutionary history of statistical learning. It would be much more interesting to make claims on the ontogeny (rather than philogeny) of statistical learning, and what regularities newborns are able to detect right after birth. I believe that this is one of the strengths of this work.

      We did not intend to make claims about the phylogeny of SL. Since SL appears to be a learning mechanism shared across species, we use it as a framework to suggest that SL may arise from general operational principles applicable to diverse neural networks. Thus, while it is highly useful for language acquisition, it is not specific to it. We will revise this section to tone down our claims.  

      (4) The description of the stimuli in Lines 110-113 is a bit confusing. In Experiment 1, e.g., "pe" and "tu" are both uttered by the same voice, correct? ("random voice each time" is confusing). Whereas in Experiment 2, e.g., "pe" and "tu" are uttered by different voices, for example, "pe" by yellow voice and "tu" by red voice. If this is correct, then I recommend the authors to rephrase this section to make it more clear.

      To clarify, in Experiment 1, the voices were randomly assigned to each syllable, with the constraint that no voice was repeated consecutively. This means that syllables within the same word were spoken by different voices, and each syllable was heard with various voices throughout the stream. As a result, neonates had to retrieve the words based solely on syllabic patterns, without relying on consistent voice associations or specific voice relationships.

      In Experiment 2, the design was orthogonal: while the syllables were presented in a random order, the voices followed a structured pattern. Similar to Experiment 1, each syllable (e.g., “pe” and “tu”) was spoken by different voices. The key difference is that in Experiment 2, the structured regularities were applied to the voices rather than the syllables. In other words, the “green” voice was always followed by the “red” voice for example but uttered different syllables.

      We will revise the methods section to clarify these important points.

      (5) Line 114: the sentence "they should compute a 36 x 36 TPs matrix relating each acoustic signal, with TPs alternating between 1/6 within words and 1/12 between words" is confusing as it seems like there are different acoustic signals. Can the authors clarify this point?

      Thank you for highlighting this point. To clarify, our suggestion is that neonates might not track regularities between phonemes and voices as separate features. Instead, they may treat each syllable-voice combination as a distinct item—for example, "pe" spoken by the "yellow" voice is one item, while "pe" spoken by the "red" voice is another. Under this scenario, there would be a total of 36 unique items (6 syllables × 6 voices), and infants would need to track regularities between these 36 combinations.

      We will rephrase this sentence in the manuscript to make it clearer.

    1. eLife Assessment

      In their study, Neiswender et al. provide important insights into how BicD2 variants linked to spinal muscular atrophy alter dynein activity and cargo specificity. The authors present convincing evidence that disease-associated mutations lead to interactome changes, supported by additional validation of the BicD2/HOPS complex and discussion of their functional implications. This well-executed study offers invaluable datasets and a strong foundation for future exploration of disease mechanisms.

    2. Reviewer #1 (Public review):

      In this work, Neiswender and colleagues test the hypothesis that mutations in BicD2 that are associated with SMALED alter BicD2-cargo interactions. To do this, they first establish the WT BicD2 cargo interactome (using a proximity-dependent biotin ligase screen with Turbo-ID on the BicD2 C-terminus). In addition to known cargo interactors, they also identified many proteins in the HOPs complex. Interestingly, they find that the HOPs complex may interact with BicD2 in a different manner than other known cargos. The authors also show that while BicD2 is required for the HOPs complex localization, on average, depletion of BicD2 from HeLa and Cos7 cells causes HOPs and Lysosome mislocalization that is consistent with Kinesin-1 trafficking defects, rather than dynein. The authors also use proximity biotin ligase approaches to define the cargo interactome of three BicD2 variants associated with SMALED. One variant (R747C) has the most altered cargo interactome. The authors highlight one protein, in particular, GRAMD1A, that is only found in the R747C dataset and mislocalizes specifically when R747C is expressed.

      The work in this manuscript is of a very high quality and contributes important findings to the field.

      Comments on revisions:

      The authors did a great job addressing the points I brought up!

    3. Reviewer #2 (Public review):

      Neiswender et al. investigated the interactomes between wild-type BICD2 and BICD2 mutants that are associated with Spinal Muscular Atrophy with Lower Extremity Predominance (SMALED2). Although BICD2 has previously been implicated in SMALED2, it is unclear how mutations in BICD2 may contribute to disease symptoms. In this study, the authors characterize the interactome of wild-type BICD2 and identify potential new cargos including the HOPS complex. The authors then chose three SMALED2-associated BICD2 mutants and compared each mutant interactome to that of wild-type BICD2. Each mutant had a change in the interactome, with the most drastic being BICD2_R747C, a mutation in the cargo binding domain of BICD2. This mutant displayed less interaction with a potential new BICD2 cargo, the HOPS complex. Additionally, it displayed more interaction with an ER protein, GRAMD1A.

      The data in the paper is generally strong but the major conclusions of this paper need more evidence to be better supported.

      (1) The authors use cells that have been engineered to express the different BICD2 constructs. As shown in Figure 4B, the authors see wide expression of BICD2_WT throughout the cell. However, WT BICD2 usually localizes to the TGN. This widespread localization introduces some uncertainty about the interactome data. The authors should either try to verify the interaction data (specifically with the HOPS complex and GRAMD1A) by immunoprecipitating endogenous BICD2 or by repeating their interactome experiment in Figure 1 using BICD2 knockout cells that express the BICD2_WT construct. This should also be done to verify the immunoprecipitation and microscopy data shown in Figure 7.

      (2) The authors conclude that cargo transport defects resulting from BICD2 mutations may contribute to SMALED2 symptoms. However, the authors are unable to determine if BICD2 directly binds to the potential new cargo, the HOPS complex. To address this, the authors could purify full-length WT BICD2 and perform in vitro experiments. Furthermore, the authors were unable to identify the minimal region of BICD2 needed for HOPS interaction. The authors could expand on the experiment attempted with the extended BICD2 C-terminal using a deltaCC1 construct, which could also be used for in vitro experiments.

      (3) Again, the authors conclude that BICD2 mutants cause cargo transport defects that are likely to lead to SMALED2 symptoms. This would be better supported if the authors are able to find a protein relevant to SMALED2 and examine if/how its localization is changed under expression of the BICD2 mutants. The authors currently use the HOPS complex and GRAMD1A as indicators of cargo transport defects, but it is unclear if these are relevant to SMALED2 symptoms.

      Comments on revisions:

      The investigators did a good job in responding to our initial concerns (see below). We appreciate that they used siRNA to address our first comment because they do not have a BICD2 KO cell line. We appreciated that they added a new section in the Discussion to address the limitations of the study.

      In regards to our first comment about the BICD2 WT construct localization, since they use KD to validate the interaction between their BICD2 WT construct and VPS41, it would be nice to see localization of this construct under the KD condition. However, the binding they presented in Sup. Fig 1B does look convincing, so this may not be necessary.

      Overall, I believe this revision has satisfied our previous concerns.

    4. Reviewer #3 (Public review):

      Summary:

      BicD2 is a motor adapter protein that facilitates cellular transport pathways, which are impacted by human disease mutations of BicD2 causing spinal muscular atrophy with lower extremity dominance (SMALED2). The authors provide evidence that some of these mutations result in interactome changes, which may be the underlying cause of the disease. This is supported by proximity biotin ligation screens, immunoprecipitation and cell biology assays. The authors identify several novel BicD2 interactions such as the HOPS complex that participates in the fusion of late endosomes and autophagosomes with lysosomes, which could have important functions. Three BicD2 disease mutants studied had changes in the interactome, which could be an underlying cause for SMALED2. The study extends our understanding of the BicD2 interactome under physiological conditions, as well as of the changes of cellular transport pathways that result in SMALED2. It will be of great interest for the BicD2 and dynein fields.

      Strengths:

      Extensive interactomes are presented for both WT BicD2 as well as the disease mutants, which will be valuable for the community. The HOPS complex was identified as a novel interactor of BicD2, which is important for fusion of late endosomes and lysosomes, which is of interest, since some of the BicD2 disease mutations result in Golgi-fragmentation phenotypes. The interaction with the HOPS complex is affected by the R747C mutation, which also results in a gain of function interaction with GRAMD1A.

      Weaknesses:

      The manuscript should be strengthened by further evidence of the BicD2/HOPS complex interaction and the functional implications for spinal muscular atrophy by changes in the interactome through mutations. Which functional implications does the loss of the BicD2/HOPS complex interaction and the gain of function interaction with GRAMD1A have in the context of the R747C mutant?

      Major points:

      (1) In the biotin proximity ligation assay, a large number of targets were identified, but it is not clear why only the HOPS complex was chosen for further verification. Immunoprecipitation was used for target verification, but due to the very high number of targets identified in the screen, and the fact that the HOPS complex is a membrane protein that could potentially be immunoprecipitated along with lysosomes or dynein, additional experiments to verify the interaction of BicD2 with the HOPS complex (reconstitution of a complex in vitro, GST-pull down of a complex from cell extracts or other approaches) are needed to strengthen the manuscript.<br /> (2) In the biotin proximity ligation assay, a large number of BicD2 interactions were identified that are distinct between the mutant and the WT, but it was not clear why particularly GRAMD1A was chosen as gain of function interaction, and what the functional role of a BicD2/GRAMD1A interaction may be. A Western blot shows a strengthened interaction with the R747C mutant but GRAMD1A also interacts with WT BicD2.<br /> (3) Furthermore, functional implications of changed interactions with HOPS and GRAMD1A in the R747C mutant are unclear. Additional experiments are needed to establish the functional implication of the loss of the BicD2/HOPS interaction in the BicD2/R747C mutant. For the GRAMD1A gain of function interaction, according to the authors a significant amount of the protein localized with BicD2/R747C at the centrosomal region. This changed localization is not very clear from the presented images (no centrosomal or other markers were used, and the changed localization could also be an effect of dynein hyper activation in the mutant). Furthermore, the functional implication of a changed localization of GRAMD1A is unclear from the presented data.

      Comments on revisions:

      After a major revision, the manuscript is much improved. Additional evidence for the HOPS complex/BicD2 interaction was provided (the interaction was identified in multiple independent screens), and while the authors unfortunately were not able to confirm a direct interaction between BicD2 and the HOPS complex, additional caveats were added in the result section, which clearly state these limitations. The authors also included a very nice discussion of potential physiological roles of the GRAMD1A mislocalization in the disease mutant, which could potentially affect cholesterol transport and homostatis. Limitations of the presented approaches were clearly described as caveats.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) I was surprised at the effect of BicD2 knockdown on LAMP (and VPS41) localization, which really suggests that in HeLa and Cos7 cells, BicD2 regulation of Kinesin-1 (rather than dynein) is the primary driver of lysosome localization. The KIF5B-knockout rescue of the BicD2overexpression phenotype was a very powerful result that supports this conclusion. Have the authors looked at other cargos, eg, Golgi or centrosomes in G2? Can the authors include more discussion about what this result means or how they imagine dynein and kinesin-1's interaction with BicD2 is regulated? 

      We have performed this experiment as requested by the reviewer. The BICD2 siRNA also resulted in Golgi fragmentation and localization defects of the centrosome in cells that are in G2 phase of the cell cycle (Supplemental Fig. 2E-H).

      We have also added additional discussion related to how BICD2 might couple cargos to opposite polarity motors (lines 440-447). Interestingly, the lysosome motility defect we observe upon BICD2 knock down has similarity to the RAB6A trafficking phenotype. In both cases, what one sees is a sharp reduction in the number of motile particles rather than a reversal in the direction of motility. This suggests that both motors are involved in the steady state distribution of these cargoes.

      (2) Have the authors examined if the SMALED mutants show diminished or increased binding to KIF5B? While the authors are correct that the mutations could hyperactivate dynein because they reduce BicD2 autoinhibition, it is possible that the SMALED mutants hyperactivate dynein because they no longer bind kinesin. This would be particularly interesting, given the complex relationship between BicD2 regulation of dynein and kinesin that the authors show in Figure 3. 

      Thank you for this suggestion. We had not considered this. We have added this experiment in the revised manuscript (Supplemental Fig. 3H, I). We find that the interaction between wild-type BICD2 and KIF5B is only slightly above the control. This is consistent with published findings that indicate that although the isolated CC2 domain of BICD2 is able to interact with KIF5B, the binding is lower for the full-length protein. This is most likely due to the intramolecular interaction between the N and C-termini of BICD2 partially blocking the binding site. Interestingly, however, all three mutants display a reduced interaction with KIF5B, with the reduction being most severe for the cargo domain binding mutants. Thus, as we discuss in the revised manuscript, dynein hyperactivity likely results from increased binding to dynein and a concurrent reduction in binding to KIF5B.

      (3) What is already known about the protein GRAMD1A? Did the authors choose to focus on GRAMD1A because it was the only novel interaction found in the SMALED mutant interactomes, or was this protein interesting for a different reason? Does the known function of GRAMD1A explain the potential dysfunction of cells expressing BICD2_R747C or patients who have this mutation? More discussion of this protein and why the authors focused on it would really strengthen the manuscript. 

      We chose to focus on GRAMD1A for a few reasons. The protein that displayed the highest gain of function interaction with BICD2_R747C in our proteomic analysis was Plastin. However, using at least one antibody against Plastin, we were not able to validate this result. In addition, we had previously performed a proteomic screen using a BICD2_R747A (arginine to alanine) mutation and had compared the interactome of this mutant to the wild-type protein. Plastin was not recovered in that screen but the top hit was GRAMD1A. Given that we isolated GRAMD1A in two separate screens as a gain of function interaction, we believed the result was worth focusing on for followup studies. 

      GRAMD1A (as well as its paralogs GRAMD1B and C) function in non-vesicle transport of accessible cholesterol from the plasma membrane to the ER. We have added additional discussion on GRAMD1A (lines 484-495). While we observe a relocalization of GRAMD1A in mutant expressing cells, we do not know whether this is sufficient to result in cholesterol transport defects. There are several routes for cholesterol uptake, with the GRAMD1A pathway representing just one these routes. 

      Reviewer #2 (Public review):

      (1) The authors use cells that have been engineered to express the different BICD2 constructs. As shown in Figure 4B, the authors see wide expression of BICD2_WT throughout the cell. However, WT BICD2 usually localizes to the TGN. This widespread localization introduces some uncertainty about the interactome data. The authors should either try to verify the interaction data (specifically with the HOPS complex and GRAMD1A) by immunoprecipitating endogenous BICD2 or by repeating their interactome experiment in Figure 1 using BICD2 knockout cells that express the BICD2_WT construct. This should also be done to verify the immunoprecipitation and microscopy data shown in Figure 7. 

      The localization of our exogenous BICD2-mNeon constructs is similar to what others have seen using GFP tagged versions of the protein (for example Peeters et al., 2013). In addition, in the experiment shown in the initial version of the paper, we were focusing on the centrosomal localization of BICD2. However, our BICD2-mNeon construct is also observed at the Golgi, in addition to its localization throughout the cell (Supplemental Fig. 3C). 

      We attempted to perform a co-immunoprecipitation experiment using endogenous proteins as suggested by the reviewer. Although a rabbit polyclonal antibody was able to coimmunoprecipitate RANBP2 with BICD2, the antibody complex of heavy and light chains comigrated with the VPS41 band and was abundantly detected by the secondary antibody used in the western blot. Thus, we were not able to make a conclusion regarding whether or not VPS41 was present in the co-immunoprecipitate. We attempted the experiment using a mouse monoclonal antibody against BICD2. However, this antibody failed in the immunoprecipitation experiment and we could not detect either RANBP2 (a validated cargo) or VPS41. Although the VPS41 antibody we used in the paper works for western blot, it does not recognize the native protein. Thus, despite our best efforts, we are not able to draw a valid conclusion from these coip experiments.

      It is beyond the scope of the revision to perform the entire experiment in a BICD2 KO cell line.  A BICD2 KO cell line does not exist and it would take several months to make such a knock out in the FLP IN HEK cells that were used in this manuscript. However, we have validated the interaction between BICD2 and VPS41 in cells that have been depleted of endogenous BICD2 (Supplemental Fig. 1B). The transgenic constructs contain silent mutations that make them refractory to bicD2 siRNA1. Thus, although endogenous BICD2 is depleted by the siRNA treatment, wild-type and mutant BICD2_TurboID is not. A similar approach was also used to demonstrate the gain of function interaction between BICD2_R747C and GRAMD1A in cells depleted of endogenous BICD2 (Supplemental Fig. 5A).

      (2) The authors conclude that cargo transport defects resulting from BICD2 mutations may contribute to SMALED2 symptoms. However, the authors are unable to determine if BICD2 directly binds to the potential new cargo, the HOPS complex. To address this, the authors could purify full-length WT BICD2 and perform in vitro experiments. Furthermore, the authors were unable to identify the minimal region of BICD2 needed for HOPS interaction. The authors could expand on the experiment attempted with the extended BICD2 C-terminal using a deltaCC1 construct, which could also be used for in vitro experiments. 

      We have not been successful in purifying full length BICD2 in bacteria, perhaps due to solubility issues. However, we have added several experiments to further examine the nature of the BICD2-HOPS complex interaction.

      We have performed the experiment as requested. We find that BICD2_delCC1 is able to bind VPS41, but not as efficiently as the full length protein. However, unlike the CC3 cargo binding construct, the BICD2_delCC1 construct also displays reduced binding to RANBP2 (Supplemental Fig. 1D). We attribute this defect to either the intramolecular BICD2 interaction blocking cargo binding or potentially to a folding defect in the BICD2_delCC1 construct. Thus, although we performed this experiment as suggested by the reviewer, we are not able to make a solid conclusion.

      Based on the fact that VPS41 was the most abundantly detected HOPS component in the BICD2 interactome, we hypothesized that it was the point of direct contact between BICD2 and the HOPS complex. However, contrary to our hypothesis, depletion of VPS41 did not compromise the association between BICD2 and VPS16 and VPS18 (Supplemental Fig. 1E). Thus, we conclude that there are multiple points of contact between BICD2 and the HOPS complex, with BICD2 perhaps recognizing a common motif or domain present in these proteins.

      We next attempted to map the interaction site using Alphafold2 multimer. Although we were able to use this platform to predict a high confidence interaction between BICD2 and RAB6A (consistent with published results), this did not yield a high confidence prediction for the BICD2HOPS complex interaction.

      Ultimately although we added several new experiments, we were not able to determine the minimal region for binding, nor whether the interaction is direct or indirect. These caveats are clearly stated in the revised manuscript. Regardless of whether the interaction is direct or indirect however, it is noteworthy that the association between BICD2 and the HOPS complex is reduced by the R747C SMALED2 mutation.

      (3) Again, the authors conclude that BICD2 mutants cause cargo transport defects that are likely to lead to SMALED2 symptoms. This would be better supported if the authors are able to find a protein relevant to SMALED2 and examine if/how its localization is changed under expression of the BICD2 mutants. The authors currently use the HOPS complex and GRAMD1A as indicators of cargo transport defects, but it is unclear if these are relevant to SMALED2 symptoms. 

      This point was addressed in the general discussion. Given the complexity of SMALED2 (autosomal dominant disorder; variable phenotypic severity; adult onset disorder in many instances, etc.) it is very hard to model in a cell line. One of the reasons we focused our studies on the HOPS complex and VPS41 in particular was because mutations in VPS41 are associated with spinocerebellar ataxia, a neurodevelopment disorder. However, we cannot conclude whether the reduction/loss of interaction of BICD2 with the HOPS complex is causative for disease symptoms. We also cannot conclude at present whether the mis-targeting of GRAMD1A is causative for disease symptoms. We have discussed these caveats in the revised manuscript and have included a section in the discussion that specifically lists the limitations of our study (lines 511-530).

      With that said, we can conclude that mutations in the cargo binding domain of BICD2 result in dynein hyperactivity, altered BICD2 localization in hippocampal neurons, and reduced neurite growth. Given that we observe interactome changes in HEK cells, it is plausible that interactome changes also exist in motor neurons. However, even in the absence of interactome changes, hyperactivation of dynein alone can result in cargo trafficking defects; the same cargos can be excessively localized in the soma vs the axon. As noted previously, however, a thorough examination of these points will require the use of genetically engineered motor neurons and is beyond the scope of the current study.

      Reviewer #3 (Public review):

      Strengths: 

      Extensive interactomes are presented for both WT BicD2 as well as the disease mutants, which will be valuable for the community. The HOPS complex was identified as a novel interactor of BicD2, which is important for fusion of late endosomes and lysosomes, which is of interest, since some of the BicD2 disease mutations result in Golgi-fragmentation phenotypes. The interaction with the HOPS complex is affected by the R747C mutation, which also results in a gain-of-function interaction with GRAMD1A. 

      Weaknesses: 

      The manuscript should be strengthened by further evidence of the BicD2/HOPS complex interaction and the functional implications for spinal muscular atrophy by changes in the interactome through mutations. Which functional implications does the loss of the BicD2/HOPS complex interaction and the gain of function interaction with GRAMD1A have in the context of the R747C mutant? 

      (1) In the biotin proximity ligation assay, a large number of targets were identified, but it is not clear why only the HOPS complex was chosen for further verification. Immunoprecipitation was used for target verification, but due to the very high number of targets identified in the screen, and the fact that the HOPS complex is a membrane protein that could potentially be immunoprecipitated along with lysosomes or dynein, additional experiments to verify the interaction of BicD2 with the HOPS complex (reconstitution of a complex in vitro, GST-pull down of a complex from cell extracts or other approaches) are needed to strengthen the manuscript. 

      As discussed for reviewer 2 (point 2), we have added several experiments to better characterize the BICD2-HOPS complex interaction.

      We chose to focus on the HOPS complex for a few reasons. The list of interactions that displayed a >2 fold enrichment vs control was actually not that large (66 proteins). Within this list, we identified 4 out of 6 HOPS components and VPS41 was the 5th most enriched protein in the BICD2 interactome (RANBP2 by contrast was #16 on this list). Furthermore, the BICD2_R747C mutation resulted in greatly reduced interaction of BICD2 with the HOPS complex, whereas its interaction with dynein was increased. These results indicate that these proteins are not simply immunoprecipitating with the BICD2/dynein complex. Apart from the HOPS complex, lysosomal proteins were not present in the interactome, making it unlikely that they were identified due to non-specific interactions between BICD2 and co-precipitating lysosomes.

      (2) In the biotin proximity ligation assay, a large number of Bi cD2 interactions were identified that are distinct between the mutant and the WT, but it was not clear why, particularly GRAMD1A was chosen as a gain-of-function interaction, and what the functional role of a BicD2/GRAMD1A interaction may be. A Western blot shows a strengthened interaction with the R747C mutant, but GRAMD1A also interacts with WT BicD2. 

      Please see the above discussion on GRAMD1A (reviewer 1, point 3). GRAMD1A comes down non-specifically with the binding control as well as BICD2_wt. We therefore conclude that wildtype BICD2 does not specifically interact with GRAMD1A above background levels (Fig. 7, compare the control lane vs BICD2-wt).

      (3) Furthermore, the functional implications of changed interactions with HOPS and GRAMD1A in the R747C mutant are unclear. Additional experiments are needed to establish the functional implication of the loss of the BicD2/HOPS interaction in the BicD2/R747C mutant. For the GRAMD1A gain of function interaction, according to the authors, a significant amount of the protein localized with BicD2/R747C at the centrosomal region. This changed localization is not very clear from the presented images (no centrosomal or other markers were used, and the changed localization could also be an effect of dynein hyperactivation in the mutant). Furthermore, the functional implication of a changed localization of GRAMD1A is unclear from the presented data. 

      We have performed the experiment as requested by the reviewer. The re-localized GRAMD1A localizes adjacent to Pericentrin, a centrosomal marker (Supplemental Fig. 5B-F). GRAMD1A and BICD2 appear to co-localize in a ring around the Pericentrin marked centrosome.

      The re-localization of GRAMD1A to the centrosomal area by BICD2_R747C appears to be unique to this mutant, and not simply an issue of dynein hyperactivity. The other two mutants tested, BICD2_N188T and BICD2_R694C also hyperactivate dynein. However, they do not result in the same type of dramatic re-localization of GRAMD1A as we observe with the BICD2_R747C mutant. We conclude that this altered localization results from a gain of function interaction with BICD2_R747C as well as dynein hyperactivity.

      Reviewer #1 (Recommendations for the authors): 

      Please add a discussion about how the authors calculated the Cell Body enrichment shown in 5E. Is this a ratio of the BicD2 intensity in the cell body:axon? Did the authors normalize for potential differences in BicD2 variant expression? 

      Yes, it is a ratio of the intensity between the cell body and axon. This is described in the Methods section under quantification (lines 725-728). We attempted to image cells expressing similar amounts of protein.  

      Reviewer #2 (Recommendations for the authors): 

      (1) The paper would benefit from an explanation of why the authors chose to follow up on the HOPS complex out of all proteins identified in the interactome experiment. 

      This discussion has been included in the revised manuscript.  

      (2) In panel B of Supplementary Figure 1, RFP mTurbo has a significant amount of non-specific binding to VPS18. The authors note that in the initial interactome experiment, there was a twofold enrichment of this protein in BICD2 pulldown versus control. Do the authors have a co-IP that has a similar enrichment?

      VPS18 occasionally comes down non-specifically with our RFP-TurboID control. However, the interaction is specific, because very little VPS18 comes down with the BICD2 construct lacking the cargo binding domain (Fig. 2B). An additional example of the VPS18 binding result is shown in Supplemental Fig. 1E.

      (3) In Figure 2B, there seems to be less Vps18 in the input for BICD2 delCC3-mTrbo. Do the authors have a blot where there is equal input across all conditions? This may increase the slight signal seen in the pulldown.

      The blot shown in Supplemental Fig. 1C has equivalent load for VPS18 across all lanes. Minimal binding of VPS18 is observed with the BICD2_delCC3 sample.

      (4) In Figure 3, can the authors show representative images of GFP-VPS-41 and LAMP1 localization that are at the same magnification? It currently looks as if the localization pattern differs between the two under control siRNA. Alternatively, the authors should show colocalization of the two, as the authors note both are localized to late endosomes/lysosomes. 

      We have provided additional images that are at the same magnification (Supplemental Fig. 2IK). Co-localization between GFP-VPS41 (rabbit polyclonal antibody against GFP) and LAMP1 (rabbit polyclonal antibody) is not possible. However, published studies have shown that a subset of V5 tagged VPS41 vesicles are positive for LAMP1. We have cited this study.

      (5) In Supplementary Figure 2, the authors should show the knockdown efficiency of both BICD2 siRNAs. The VPS41 staining in panel B looks like there is less perinuclear localization than with BICD2 siRNA 1. Is the because of knockdown efficiency? 

      We have included this data (Supplemental Fig. 2B). Both siRNAs are capable of depleting BICD2. However, we do see slightly more effective knock down with siRNA-1.

      (6) The data in Figure 4A would be more striking with quantification. 

      Quantifications have been provided (Supplemental Fig. 3A,B). Using a one-way Anova analysis, BICD2_R747C is the only mutant that shows significance. Variability in the binding experiment resulted in the other two mutants not showing a statistically significant change. However, the additional assays that are provided (centrosomal enrichment of BICD2 and peroxisome tethering) clearly demonstrate that the R694C mutant also results in dynein hyperactivation. It should be noted that the analysis done by Huynh et al., 2017 also showed a binding increase between BICD2 disease mutants and dynein. However, due to binding variability, their results were not not statistically significant.

      (7) Can the authors explain how centrosome enrichment is calculated in Figure 4F? The intensity of colocalization with the centrosome between mutant constructs visually does not look significantly different. Is this a ratio of centrosome localization to cell body localization? 

      We apologize for this omission. This has been added to the quantification section of the Methods (lines 721-723). Yes, it is a ratio of mean signal at the centrosome vs mean signal in the rest of the cell.

      (8) The current input blot in Supplementary Figure 4A shows increasing amounts of importin beta across the lanes. Do the authors have a blot of panel A in which the input level of importin beta is the same between constructs? Does this change the level of importin beta that is pulled down?

      Another replicate of this experiment has been shown. We have retained the original experiment as well (Supplemental Figs. 4A, B).

      Reviewer #3 (Recommendations for the authors): 

      Minor points: 

      (1) In the .pdf version of the supplemental tables, the text is often cropped. It is recommended to delete the .pdf versions and just retain the Excel versions of the tables. 

      We are not sure why this occurred. Excel files were provided. In addition, the raw data from the mass spectrometry experiments will also be included with the final version of the manuscript.

      (2) Line 367: For transport of Rab6, kinesin-1 is the dominant motor, but dynein is still active and engaging in a tug of war (Serra Marquez et al 2022). 

      Thank you. We have revised our text to include this discussion. In this regard, LAMP1 vesicles are similar. Loss of BICD2 results in a greater number of stationary vesicles rather than vesicles that are excessively targeted towards the microtubules minus end.

      (3) Line 371: BicD2 is required for the transport of RanBP2 from annulate lamellae to nuclear pore complexes.

      Thank you. We have modified our text. 

      (4) Yi et al., 2023 have previously shown changed interactions of the BicD2/R747C mutant, such as decreased binding to Nup358 and increased binding to Nesprin-2, as well as functional implications for the associated brain developmental pathways, which should be acknowledged.

      We apologize for leaving this out. In the original version of the manuscript, we were attempting to keep the discussion more concise. We have added a discussion of these findings in the revised manuscript (lines 496-507).

    1. Author response:

      Reviewer #1 (Public review):

      The study examines how pyruvate, a key product of glycolysis that influences TCA metabolism and gluconeogenesis, impacts cellular metabolism and cell size. It primarily utilizes the Drosophila liver-like fat body, which is composed of large post-mitotic cells that are metabolically very active. The study focuses on the key observations that over-expression of the pyruvate importer MPC complex (which imports pyruvate from the cytoplasm into mitochondria) can reduce cell size in a cell-autonomous manner. They find this is by metabolic rewiring that shunts pyruvate away from TCA metabolism and into gluconeogenesis. Surprisingly, mTORC and Myc pathways are also hyper-active in this background, despite the decreased cell size, suggesting a non-canonical cell size regulation signaling pathway. They also show a similar cell size reduction in HepG2 organoids. Metabolic analysis reveals that enhanced gluconeogenesis suppresses protein synthesis. Their working model is that elevated pyruvate mitochondrial import drives oxaloacetate production and fuels gluconeogenesis during late larval development, thus reducing amino acid production and thus reducing protein synthesis.

      Strengths:

      The study is significant because stem cells and many cancers exhibit metabolic rewiring of pyruvate metabolism. It provides new insights into how the fate of pyruvate can be tuned to influence Drosophila biomass accrual, and how pyruvate pools can influence the balance between carbohydrate and protein biosynthesis. Strengths include its rigorous dissection of metabolic rewiring and use of Drosophila and mammalian cell systems to dissect carbohydrate:protein crosstalk.

      Weaknesses:

      However, questions on how these two pathways crosstalk, and how this interfaces with canonical Myc and mTORC machinery remain. There are also questions related to how this protein:carbohydrate crosstalk interfaces with lipid biosynthesis. Addressing these will increase the overall impact of the study.

      We thank the reviewer for recognizing the significance of our work and for providing constructive feedback. Our findings indicate that elevated pyruvate transport into mitochondria acts independently of canonical pathways, such as mTORC1 or Myc signaling, to regulate cell size. To investigate these pathways, we utilized immunofluorescence with well-validated surrogate measures (p-S6 and p-4EBP1) in clonal analyses of MPC expression, as well as RNA-seq analyses in whole fat body tissues expressing MPC. These methods revealed hyperactivation of mTORC1 and Myc signaling in fat body cells expressing MPC in Drosophila, which are dramatically smaller than control cells. One explanation of these seemingly contradictory observations could be an excess of nutrients that activate mTORC1 or Myc pathways. However, our data is inconsistent with a nutrient surplus that could explain this hyperactivation. Instead, we observed reduced amino acid abundance upon MPC expression, which is very surprising given the observed hyperactivation of mTORC1. This led us to hypothesize the existence of a feedback mechanism that senses inappropriate reductions in cell size and activates signaling pathways to promote cell growth. The best characterized “sizer” pathway for mammalian cells is the CycD/CDK4 complex which has been well studied in the context of cell size regulation of the cell cycle (PMID 10970848, 34022133). However, the mechanisms that sense cell size in post-mitotic cells, such as fat body cells and hepatocytes, remain poorly understood. Investigating the hypothesized size-sensing mechanisms at play here is a fascinating direction for future research.

      For the current study, we conducted epistatic analyses with mTOR pathway members by overexpressing PI3K and knocking down the TORC1 inhibitor Tuberous Sclerosis Complex 1 (Tsc1). These manipulations increased the size of control fat body cells but not those over-expressing the MPC (Supplementary Fig. 3c, 3d). Regarding Myc, its overexpression increased the size of both control and MPC+ clones (Supplementary Fig. 3e), but Myc knockdown had no additional effect on cell size in MPC+ clones (Supplementary Fig. 3f). These results suggest that neither mTORC1, PI3K, nor Myc are epistatic to the cell size effects of MPC expression. Consequently, we shifted our focus to metabolic mechanisms regulating biomass production and cell size.

      When analyzing cellular biomolecules contributing to biomass, we observed a significant impact on protein levels in Drosophila fat body cells and mammalian MPC-expressing HepG2 spheroids. TAG abundance in MPC-expressing HepG2 spheroids and whole fat body cells showed a statistically insignificant decrease compared to controls. Furthermore, lipid droplets in fat body cells were comparable in MPC-expressing clones when normalized to cell size.

      Interestingly, RNA-seq analysis revealed increased expression of fatty acid and cholesterol biosynthesis pathways in MPC-expressing fat body cells. Upregulated genes included major SREBP targets, such as ATPCL (2.08-fold), FASN1 (1.15-fold), FASN2 (1.07-fold), and ACC (1.26-fold). Since mTOR promotes SREBP activation and MPC-expressing cells showed elevated mTOR activity and upregulation of SREBP targets, we hypothesize that SREBP is activated in these cells. Nonetheless, our data on amino acid abundance and its impact on protein synthesis activity suggest that protein abundance, rather than lipids, is likely to play a larger causal role in regulating cell size in response to increased pyruvate transport into mitochondria.

      Reviewer #2 (Public review):

      In this manuscript, the authors leverage multiple cellular models including the drosophila fat body and cultured hepatocytes to investigate the metabolic programs governing cell size. By profiling gene programs in the larval fat body during the third instar stage - in which cells cease proliferation and initiate a period of cell growth - the authors uncover a coordinated downregulation of genes involved in mitochondrial pyruvate import and metabolism. Enforced expression of the mitochondrial pyruvate carrier restrains cell size, despite active signaling of mTORC1 and other pathways viewed as traditional determinants of cell size. Mechanistically, the authors find that mitochondrial pyruvate import restrains cell size by fueling gluconeogenesis through the combined action of pyruvate carboxylase and phosphoenolpyruvate carboxykinase. Pyruvate conversion to oxaloacetate and use as a gluconeogenic substrate restrains cell growth by siphoning oxaloacetate away from aspartate and other amino acid biosynthesis, revealing a tradeoff between gluconeogenesis and provision of amino acids required to sustain protein biosynthesis. Overall, this manuscript is extremely rigorous, with each point interrogated through a variety of genetic and pharmacologic assays. The major conceptual advance is uncovering the regulation of cell size as a consequence of compartmentalized metabolism, which is dominant even over traditional signaling inputs. The work has implications for understanding cell size control in cell types that engage in gluconeogenesis but more broadly raise the possibility that metabolic tradeoffs determine cell size control in a variety of contexts.

      We thank the reviewer for their thoughtful recognition of our efforts, and we are honored by the enthusiasm the reviewer expressed for the findings and the significance of our research. We share the reviewer’s opinion that our work might help to unravel metabolic mechanisms that regulate biomass gain independent of the well-known signaling pathways.

      Reviewer #3 (Public review):

      Summary:

      In this article, Toshniwal et al. investigate the role of pyruvate metabolism in controlling cell growth. They find that elevated expression of the mitochondrial pyruvate carrier (MPC) leads to decreased cell size in the Drosophila fat body, a transformed human hepatocyte cell line (HepG2), and primary rat hepatocytes. Using genetic approaches and metabolic assays, the authors find that elevated pyruvate import into cells with forced expression of MPC increases the cellular NADH/NAD+ ratio, which drives the production of oxaloacetate via pyruvate carboxylase. Genetic, pharmacological, and metabolic approaches suggest that oxaloacetate is used to support gluconeogenesis rather than amino acid synthesis in cells over-expressing MPC. The reduction in cellular amino acids impairs protein synthesis, leading to impaired cell growth.

      Strengths:

      This study shows that the metabolic program of a cell, and especially its NADH/NAD+ ratio, can play a dominant role in regulating cell growth.

      The combination of complementary approaches, ranging from Drosophila genetics to metabolic flux measurements in mammalian cells, strengthens the findings of the paper and shows a conservation of MPC effects across evolution.

      Weaknesses:

      In general, the strengths of this paper outweigh its weaknesses. However, some areas of inconsistency and rigor deserve further attention.

      Thank you for reviewing our manuscript and offering constructive feedback. We appreciate your recognition of the significance of our work and your acknowledgment of the compelling evidence we have presented. We will carefully revise the manuscript in line with the reviewers' recommendations.

      The authors comment that MPC overrides hormonal controls on gluconeogenesis and cell size (Discussion, paragraph 3). Such a claim cannot be made for mammalian experiments that are conducted with immortalized cell lines or primary hepatocytes.

      We appreciate the reviewer’s insightful comment. Pyruvate is a primary substrate for gluconeogenesis, and our findings suggest that increased pyruvate transport into mitochondria increases the NADH-to-NAD+ ratio, and thereby elevates gluconeogenesis. Notably, we did not observe any changes in the expression of key glucagon targets, such as PC, PEPCK2, and G6PC, suggesting that the glucagon response is not activated upon MPC expression. By the statement referenced by the reviewer, we intended to highlight that excess pyruvate import into mitochondria drives gluconeogenesis independently of hormonal and physiological regulation.

      It seems the reviewer might also have been expressing the sentiment that our in vitro models may not fully reflect the in vivo situation, and we completely agree.  Moving forward, we plan to perform similar analyses in mammalian models to test the in vivo relevance of this mechanism. For now, we will refine the language in the manuscript to clarify this point.

      Nuclear size looks to be decreased in fat body cells with elevated MPC levels, consistent with reduced endoreplication, a process that drives growth in these cells. However, acute, ex vivo EdU labeling and measures of tissue DNA content are equivalent in wild-type and MPC+ fat body cells. This is surprising - how do the authors interpret these apparently contradictory phenotypes?

      We thank the reviewer for raising this important issue. The size of the nucleus is regulated by DNA content and various factors, including the physical properties of DNA, chromatin condensation, the nuclear lamina, and other structural components (PMID 32997613). Additionally, cytoplasmic and cellular volume also impacts nuclear size, as extensively documented during development (PMID 17998401, PMID 32473090).

      In MPC-expressing cells, it is plausible that the reduced cellular volume impacts chromatin condensation or the nuclear lamina in a way that slightly decreases nuclear size without altering DNA content. Specifically, in our whole fat body experiments using CG-Gal4 (as shown in Supplementary Figure 2a-c), we noted that after 12 hours of MPC expression, cell size was significantly reduced (Supplementary Figure 2c and Author response image 1A). However, the reduction in nuclear size became significant only after 36 hours of MPC expression (Author response image 1B), suggesting that the reduction in cell size is a more acute response to MPC expression, followed only later by effects on nuclear size.

      In clonal analyses, this relationship was further clarified. MPC-expressing cells with a size greater than 1000 µm² displayed nuclear sizes comparable to control cells, whereas those with a drastic reduction in cell size (less than 1000 µm²) exhibited smaller nuclei (Author response image 1C and D). These observations collectively suggest that changes in nuclear size are more likely to be downstream rather than upstream of cell size reduction. Given that DNA content remains unaffected, we focused on investigating the rate of protein synthesis. Our findings suggest that protein synthesis might play a causal role in regulating cell size, thereby reinforcing the connection between cellular and nuclear size in this context.

      Author response image 1.

      Cell Size vs. Nuclear Size in MPC-Expressing Fat Body Cells. A. Cell size comparison between control (blue, ay-GFP) and MPC+ (red, ay-MPC) fat body cells over time, measured in hours after MPC expression induction. B. Nuclear area measurements from the same fat body cells in ay-GFP and ay-MPC groups. C. Scatter plot of nuclear area vs. cell area for control (ay-GFP) cells, including the corresponding R<sup>²</sup> value. D. Scatter plot of nuclear area vs. cell area for MPC-expressing (ay-MPC) cells, with the respective R<sup>²</sup> value.

      This image highlights the relationship between nuclear and cell size in MPC-expressing fat body cells, emphasizing the distinct cellular responses observed following MPC induction.

      In Figure 4d, oxygen consumption rates are measured in control cells and those over-expressing MPC. Values are normalized to protein levels, but protein is reduced in MPC+ cells. Is oxygen consumption changed by MPC expression on a per-cell basis?

      As described in the manuscript, MPC-expressing cells are smaller in size. In this context, we felt that it was most appropriate to normalize oxygen consumption rates (OCR) to cellular mass to enable an accurate interpretation of metabolic activity. Therefore, we normalized OCR with protein content to account for variations in cellular size and (probably) mitochondrial mass.

      Trehalose is the main circulating sugar in Drosophila and should be measured in addition to hemolymph glucose. Additionally, the units in Figure 4h should be related to hemolymph volume - it is not clear that they are.

      We appreciate this valuable suggestion. In the revised manuscript, we will quantify trehalose abundance in circulation and within fat bodies. As described in the Methods section, following the approach outlined in Ugrankar-Banerjee et al., 2023, we bled 10 larvae (either control or MPC-expressing) using forceps onto parafilm. From this, 2 microliters of hemolymph were collected for glucose measurement. We will apply this methodology to include the trehalose measurements as part of our updated analysis.

      Measurements of NADH/NAD ratios in conditions where these are manipulated genetically and pharmacologically (Figure 5) would strengthen the findings of the paper. Along the same lines, expression of manipulated genes - whether by RT-qPCR or Western blotting - would be helpful to assess the degree of knockdown/knockout in a cell population (for example, Got2 manipulations in Figures 6 and S8).

      We appreciate this suggestion, which will provide additional rigor to our study. We have already quantified NADH/NAD+ ratios in HepG2 cells under UK5099, NMN, and Asp supplementation, as presented in Figure 6k. As suggested, we will quantify the expression of Got2 manipulations mentioned in Figure 6j using RT-qPCR and validate the corresponding data in Supplementary Figure 8f through western blot analysis.

      Additionally, we will assess the efficiency of pcb, pdha, dlat, pepck2, and Got2 manipulations used to modulate the expression of these genes. These validations will ensure the robustness of our findings and strengthen the conclusions of our study.

    1. Author response:

      Reviewer #1:

      Weaknesses:

      (1) The crystal structure of HsIFT172c reveals a single globular domain formed by the last three TPR repeats and C-terminal residues of IFT172. However, the authors subdivide this globular domain into TPR, linker, and U-box-like regions that they treat as separate entities throughout the manuscript. This is potentially misleading as the U-box surface that is proposed to bind ubiquitin or E2 is not surface accessible but instead interacts with the TPR motifs. They justify this approach by speculating that the presented IFT172c structure represents an autoinhibited state and that the U-box-like domain can become accessible following phosphorylation. However, additional evidence supporting the proposed autoinhibited state and the potential accessibility of the U-box surface following phosphorylation is needed, as it is not tested or supported by the current data.

      We thank the reviewer for this comment. IFT172C contains TPR region and Ubox-like region which are admittedly tightly bound to each other. While there is a possibility that this region functions and exists as one domain, below are the reasons why we chose to classify these regions as two different domains.

      (1) TPR and Ubox-like regions are two different structural classes

      (2) TPR region is linked to Ubox-like region via a long linker which seems poised to regulate the relative movement between these regions.

      (3) Many ciliopathy mutations are mapped to the interface of TPR region and the Ubox region hinting at a regulatory mechanism governed by this interface.

      (2) While in vitro ubiquitination of IFT172 has been demonstrated, in vivo evidence of this process is necessary to support its physiological relevance.

      We thank the reviewer for this comment. We are currently working on identifying the substrates of IF172 to reveal the physiological relevant of its ubiquitination activity.

      (3) The authors describe IFT172 as being autoubiquitinated. However, the identified E2 enzymes UBCH5A and UBCH5B can both function in E3-independent ubiquitination (as pointed out by the authors) and mediate ubiquitin chain formation in an E3-independent manner in vitro (see ubiquitin chain ladder formation in Figure 3A). In addition, point mutation of known E3-binding sites in UBCH5A or TPR/U-box interface residues in IFT172 has no effect on the mono-ubiquitination of IFT172c1. Together, these data suggest that IFT172 is an E3-independent substrate of UBCH5A in vitro. The authors should state this possibility more clearly and avoid terminology such as "autoubiquitination" as it implies that IFT172 is an E3 ligase, which is misleading. Similarly, statements on page 10 and elsewhere are not supported by the data (e.g. "the low in vitro ubiquitination activity exhibited by IFT172" and "ubiquitin conjugation occurring on HsIFT172C1 in the presence of UBCH5A, possibly in coordination with the IFT172 U-box domain").

      We now consider this possibility and tone down our statements about the autoubiquitination activity of IFT172 in a revised version of the manuscript.

      (4) Related to the above point, the conclusion on page 11, that mono-ubiquitination of IFT172 is U-box-independent while polyubiquitination of IFT172 is U-box-dependent appears implausible. The authors should consider that UBCH5A is known to form free ubiquitin chains in vitro and structural rearrangements in F1715A/C1725R variants could render additional ubiquitination sites or the monoubiquitinated form of IFT172 inaccessible/unfavorable for further processing by UBCH5A.

      We now consider this possibility and tone down our statements about the autoubiquitination activity of IFT172 in the conclusion on pg. 11.

      (5) Identification of the specific ubiquitination site(s) within IFT172 would be valuable as it would allow targeted mutation to determine whether the ubiquitination of IFT172 is physiologically relevant. Ubiquitination of the C1 but not the C2 or C3 constructs suggests that the ubiquitination site is located in TPRs ranging from residues 969-1470. Could this region of TPR repeats (lacking the IFT172C3 part) suffice as a substrate for UBCH5A in ubiquitination assays?

      We thank the reviewer for raising this important point about ubiquitination site identification. While not included in our manuscript, we did perform mass spectrometry analysis of ubiquitination sites using wild-type IFT172 and several mutants (P1725A, C1727R, and F1715A). As shown in the figure below, we detected multiple ubiquitination sites across these constructs. The wild-type protein showed ubiquitination at positions K1022, K1237, K1271, and K1551, while the mutants displayed slightly different patterns of modification. However, we should note that the MS intensity signals for these ubiquitinated peptides were relatively low compared to unmodified peptides, making it difficult to draw strong conclusions about site specificity or physiological relevance.

      Author response image 1.

      These results align with the reviewer's suggestion that ubiquitination occurs within the TPR-containing region. However, given the technical limitations of the MS analysis and the potential for E3-independent ubiquitination by UBCH5A, we have taken a conservative approach in interpreting these findings.

      (6) The discrepancy between the molecular weight shifts observed in anti-ubiquitin Western blots and Coomassie-stained gels is noteworthy. The authors show the appearance of a mono-ubiquitinated protein of ~108 kDa in anti-ubiquitin Western blots. However, this molecular weight shift is not observed for total IFT172 in the corresponding Coomassie-stained gels (Figures 3B, D, F). Surprisingly, this MW shift is visible in an anti-His Western blot of a ubiquitination assay (Fig 3C). Together, this raises the concern that only a small fraction of IFT172 is being modified with ubiquitin. Quantification of the percentage of ubiquitinated IFT172 in the in vitro experiments could provide helpful context.

      We do acknowledge in the manuscript is that the conjugation of ubiquitins to IFT172C is weak (Page 16). Future experiments of identification of potential substrates and its implications in ciliary regulation will provide further context to our in vitro ubiquitination experiments.

      (7) The authors propose that IFT172 binds ubiquitin and demonstrate that GST-tagged HsIFT172C2 or HsIFT172C3 can pull down tetra-ubiquitin chains. However, ubiquitin is known to be "sticky" and to have a tendency for weak, nonspecific interactions with exposed hydrophobic surfaces. Given that only a small proportion of the ubiquitin chains bind in the pull-down, specific point mutations that identify the ubiquitin-binding site are required to convincingly show the ubiquitin binding of IFT172.

      (8) The authors generated structure-guided mutations based on the predicted Ub-interface and on the TPR/U-box interface and used these for the ubiquitination assays in Fig 3. These same mutations could provide valuable insights into ubiquitin binding assays as they may disrupt or enhance ubiquitin binding (by relieving "autoinhibition"), respectively. Surprisingly, two of these sites are highlighted in the predicted ubiquitin-binding interface (F1715, I1688; Figure 4E) but not analyzed in the accompanying ubiquitin-binding assays in Figure 4.

      We agree that these mutations could provide insights into ubiquitin binding by IFT172. We are currently pursuing further mutagenesis studies on the IFT172-Ub interface based on the AF model. We however have evaluated the ubiquitin binding activity of the mutant F1715A using similar pulldowns, which showed no significant impact for the mutation on the ubiquitin binding activity of IFT172. We are yet to evaluate the impact of alternate amino acid substitutions at these positions. The I1688 mutants we cloned could not be expressed in soluble form, thus could not be used for testing in ubiquitination activity or ubiquitin binding assays.

      (9) If IFT172 is a ubiquitin-binding protein, it might be expected that the pull-down experiments in Figure S1 would identify ubiquitin, ubiquitinated proteins, or E2 enzymes. These were not observed, raising doubt that IFT172 is a ubiquitin-binding protein.

      It is likely that IFT172 only binds ubiquitin with low affinity as indicated by our in vitro pulldowns and the AF interface. In our pull down experiment performed using the Chlamy flagella extracts, we have used extensive washes to remove non-specific interactors. This might have also excluded the identification of weak but bona fide interactors of IFT172. Additionally, we have not used any ubiquitination preserving reagents such as NEM in our pulldown buffers, exposing the cellular ubiquitinated proteins to DUB mediated proteolysis further preventing their identification in our pulldown/MS experiment.

      (10) The cell-based experiments demonstrate that the U-box-like region is important for the stability of IFT172 but does not demonstrate that the effect on the TGFb pathway is due to the loss of ubiquitin-binding or ubiquitination activity of IFT172.

      We acknowledge that our current data cannot distinguish whether the TGFβ pathway defects arise from general protein instability or from specific loss of ubiquitin-related functions. Our experiments demonstrate that the U-box-like region is required for both IFT172 stability and proper TGFβ signaling, but we agree that establishing a direct mechanistic link between these phenomena would require additional evidence. We will revise our discussion to more clearly acknowledge this limitation in our current understanding of the relationship between IFT172's U-box region and TGFβ pathway regulation.

      (11) The challenges in experimentally validating the interaction between IFT172 and the UBX-domain-containing protein are understandable. Alternative approaches, such as using single domains from the UBX protein, implementing solubilizing tags, or disrupting the predicted binding interface in Chlamydomonas flagella pull-downs, could be considered. In this context, the conclusion on page 7 that "The uncharacterized UBX-domain-containing protein was validated by AF-M as a direct IFT172 interactor" is incorrect as a prediction of an interaction interface with AF-M does not validate a direct interaction per se.

      We agree with the reviewer that our AlphaFold-Multimer (AF-M) predictions alone do not constitute experimental validation of a direct interaction. We appreciate the reviewer's understanding of the technical challenges in validating this interaction experimentally. We will revise our text to more precisely state that "The uncharacterized UBX-domain-containing protein was validated by AF-M as a potential direct IFT172 interactor" and will discuss the AF-M predictions as computational evidence that suggests, but does not prove, a direct interaction. This more accurately reflects the current state of our understanding of this potential interaction.

      Reviewer #3:

      Weaknesses:

      (1) Interaction studies were carried out by pulldown experiments, which identified more IFT172 interaction partners. Whether these interactions can be seen in living cells remains to be elucidated in subsequent studies.

      We agree with the reviewer that validation of protein-protein interactions in living cells provides important physiological context. While our pulldown experiments have identified several promising interaction partners and the AF-M predictions provide computational support for these interactions, we acknowledge that demonstrating these interactions in vivo would strengthen our findings. However, we believe our current biochemical and structural analyses provide valuable insights into the molecular basis of IFT172's interactions, laying important groundwork for future cell-based studies.

      (2) The cell culture-based experiments in the IFT172 mutants are exciting and show that the U-box domain is important for protein stability and point towards involvement of the U-box domain in cellular signaling processes. However, the characterization of the generated cell lines falls behind the very rigorous analysis of other aspects of this work.

      We thank the reviewer for noting that the characterization of our cell lines could be more rigorous. In the revised manuscript, we will provide additional characterization of the cell lines, including detailed sequencing information and validation data for the IFT172 mutants. This will bring the documentation of our cell-based experiments up to the same standard as other aspects of our work.

    1. Author response:

      We thank the reviewers for their help and their suggestions to make this manuscript more rigorous. We would like to post provisional author responses when eLife publish the reviewed preprint, and the more detailed responses will be supplemented with the revised manuscript.

      • There are questions about choices made in the computational approach (architecture and type of generative model, training set).

      We will train a new generator model based on the current GAN architecture, but with ‘hybrid’ AMP/AVP training sets (Reviewer 1 and 3). Hence, we can directly compare the performances of two generators. Based on our preliminary data, providing GAN with more AVP sequences during training helped the designed peptides pass the AVP filter, at the cost of reducing the average AMPredicgtor scores. The new generator also elevated the diversity of designed sequences.

      We also perturbed the detailed architecture of our deep learning models, including fully-connected graph edge encodings and different versions of ESM (e.g. esm1b_t33_650M_UR50S, esm2_t48_15B_UR50D, Reviewer 2). In the revised manuscript, we will report the effects of these modifications and suggest the overall construct of GCN and GAN are suitable for a light-weight sequence label model, as demonstrated in Author response table 1 and 2. For the generator, we suggest that using our approach, we may have reached a plateau for the GAN sampling (Author response table 3).

      Author response table 1.

      Results of AMPredictor with different graph edge encodings

      Author response table 2.

      Results of AMPredictor with different ESM versions

      Author response table 3.

      Evaluation of generated sequences with different sampling numbers

      • There is an important concern about the small number of antimicrobial peptides tested, compared to other studies, and the origin of antiviral activities.

      We will address this concern by increasing the number of peptides tested in anti-microbial and anti-viral experiments. As reported in current version of our manuscript, the first generation of GAN generated 128 unique designs and the top 2% (3 designs) was tested experimentally. The second generation of GAN will produce ~1024 designs (1-2 weeks) and the top 2% (~ 20 new sequences) will be tested. We are in the process of synthesize (2-3 weeks) and MIC measurement (1 week). The overall size of tested sample will reach 20-30 sequences. We will focus on sequences with low similarity (< 30%) to any known AMPs, thus expanding the universe functional peptides. We estimated the collection of these new data in 6 weeks.

    1. Author response:

      Reviewer #1 (Public Review):

      (1) Figure 3: it is unclear what is the efficiency of Msi2 deletion shRNA - could you demonstrate it by at least two independent methods? (QPCR, Western, or IHC?) please quantitate the data.

      In Figure 3, we did not delete Msi2 via shRNA. Instead, we utilized a genetic model in which the Msi2 gene was disrupted via gene trap mutagenesis. We have also used this model in previous publications to define the impact of Msi2 loss in other systems1.

      (2) In Figure 4, similarly, it is unclear if Msi2 depletion was effective- and what is shRNA efficiency. Please test this by at least two independent methods (QPCR, Western, or IHC) and also please quantitate the data

      We demonstrated that the efficiency of Msi2 depletion was ~83% (Figures 4A and 4C) via qPCR analysis for our in vitro and in vivo experiments, respectively, and verified the knockdown via bulk RNA-seq analysis. The shRNA hairpin used was previously validated and published by our lab2.

      (3) the reason for impairment of cell growth demonstrated in Figs 3 and 4 is not clear: is it apoptosis? Necrosis? Cell cycle defects? Autophagy? Senescence? Please probe 2-3 possibilities and provide the data.

      The basis of the cell growth impairment after Msi2 deletion/knockdown in this paper is certainly an important question, and future experiments will be performed to better delineate this. In previous publications loss of Msi2 in leukemia cells has been shown to inhibit growth via arrested cell cycle progression by increasing the expression of p213. Further, loss of Msi2 was also shown to promote apoptosis in part by upregulating Bax3. These data suggest that Msi2 can have an impact via multiple distinct mechanisms including by mediating cell cycle arrest and blocking apoptosis. While these specific genes were not detectably changed after loss of Msi2 in lung cancer cells, other genes in these and other pathways will be important to study in the future.

      (4) Since Musashi-1 is a Musashi-2 paralogue that could compensate for Musashi-2 loss, please test Msi1 expression levels in matching Fig 3 and Fig 4 sections (in cells/ tumors with Msi2 deletion and in KP cells with Msi2 shRNA). One method could suffice here.

      In our RNA-seq of cells following Msi2 knockdown, Msi1 expression was undetectable. The TPM values for Msi1 in control and knockdown cells were less than 0.01, suggesting that it did not compensate for the loss of Msi2.

      (5) It is not exactly clear why RNA-seq (as opposed to proteomics) was done to investigate downstream Msi2 targets (since Msi2 is in first place, translational and not transcriptional regulator)- . RNA effects in Fig 5J are quite modest, 2-fold or so. It would be useful (if antibodies available) to test four targets in Fig 5J by Western blot, to see any impact of musashi-2 depletion on those target protein levels. Indeed, several papers - including Kudinov et al PNAS, PMID: 27274057, Makhov P et al PMID: 33723247 and PMID: 37173995 - used proteomics/ RIP approaches and found direct Musashi-2 targets in lung cancer, including EGFR, and others.

      Previous published work from the lab showed that expression of Msi2 in the context of myeloid leukemia1can not only repress NUMB protein (I believe protein should be all caps?) (as has been previously demonstrated in the nervous system) but also Numb RNA. This indicated that as an RNA binding protein, Msi2 also can bind and destabilize direct binding targets such as Numb; this was the reason for pursuing transcriptomic analysis.  However as the reviewer suggests, proteomic studies are certainly very important to develop a complete picture of the impact of Musashi to determine which targets are controlled by Msi2 at the protein level.

      Reviewer #2 (Public Review):

      (1) It will be interesting to determine whether Msi2+ cells are a relatively stable subset or rather the Msi2+ cells in lung is a dynamic concept that is transient or interconvertible. This is relevant to the interpretation of what Msi2 positivity really means.

      In previous unpublished work from our lab, we have found that Msi2+ cells from a GFP reporter KPf/fC mouse are readily able to become GFP negative (Msi2-), but the inverse is not true. Specifically, when Msi2+ KPf/fC pancreatic cells were transplanted into the flanks of NSG mice, Msi2+ cells formed tumors in all recipients; these tumors contained both GFP+ and GFP- cells (over 80%)  recapitulating the original heterogeneity and suggesting GFP+ cells can give rise to both GFP+ and GFP- cells (Lytle and Reya, unpublished observations). In contrast only a small subset of GFP- transplanted mice formed tumors. One of the rare GFP- derived tumors was isolated and found to contain largely GFP- cells, with ~0.1% GFP+ cells. The small frequency of GFP expression could be from contaminating cells or may suggest that GFP- cells retain some ability to switch on Msi under selective pressure, and that although they pose a lower risk of driving tumorigenesis than Msi+ cells, they may nonetheless bear latent potential to become higher risk. These data may offer a possible model for projecting the potential of Msi2+ cells in the lung, but is something that needs to be further studied in this tissue.

      (2) Does Kras mutation and/or p53 loss upregulate Msi2? This point and the point above are related to whether Msi2+ cells are truly more susceptible to tumorigenesis, as the authors suggested.

      In unpublished work from our lab, we have found that Kras mutation upregulates Msi2 over baseline and subsequent p53 loss upregulates Msi2 further in the context of pancreatic cells (Lytle and Reya unpublished results), therefore it is possible that the same is true for the lung. Specifically, we have observed that Msi2 increased from normal acinar cells to Kras-mutated acinar (e.g. pancreatic intraepithelial neoplasia (PanIN)).

      To address whether Msi2+ cells are more susceptible to tumorigenesis, we have recently published data showing that the stabilization of the oncogenic MYC protein in lung Msi2+ cells drive the formation of small-cell lung cancer in a new inducible Msi2-CreERT2; CAG-LSL-MycT58A mice (Msi2-Myc)4 model. More importantly, this data provides the first evidence that normal Msi2+ cells are primed and highly sensitive to MYC-driven transformation across many organs and not just the lung4.

      (3) The KO of Msi2 reducing tumor number and burden in the lung cancer initiation model is interesting. However, there are two alternative interpretations. First, it is possible that the Msi2 KO mice (without Kras activation and p53 loss) has reduced total lung cell numbers or altered percentage of stem cells. There is currently only one sentence citing data not shown on line 125, commenting that there is no difference in BASC and AT2 cell populations. It will be helpful that such data are shown and the effect of KO on overall lung mass or cellularity is clarified. Second, the phenotype may also be due to a difference in the efficiencies of cre on Kras and p53 in the Msi2 WT and KO mice.

      We isolated the lungs of three Msi2 WT and three Msi2 KO mice and used immunofluorescence staining to stain for CC10 (BASC) and SPC (AT2) to determine if these cell populations were reduced after Msi2 loss alone. Below are representative images showing that the Msi2 KO mice did not have lower numbers of both BASC and AT2 cell populations. 

      Author response image 1.

      (4) All shRNA experiments (for both Msi2 KD and the KD of candidate genes) utilized a single shRNA. This approach cannot exclude off-target effects of the shRNA.

      The shRNA hairpin used for Msi2 was previously validated and published by our lab2. Additionally, in this work we did develop and use a Msi2 genetic knockout mouse model that validates our shRNA knockdown data showing the specific impact of Msi2 on lung tumor growth.

      (5) The technical details of the PDX experiment (Figure 4F) are not fully explained.

      Due to space considerations, we were unable not put the specifics in the legend, but the details are in the methods section (Flank Transplant Assays). In brief, 500,000 cells/well were plated in a 6-well plate coated with Matrigel and 83,000 cells/well were plated in a 24-well plate coated with Matrigel for subsequent determination of transduction efficiency via FACS. 24 hours after transduction, media from the cells was collected and placed on ice. 1mL of 2mg/mL collagenase/dispase was then added to the well and incubated for 45 minutes at 37ºC to dissociate the remaining cells from Matrigel followed by subsequent washes. Cells were pelleted by centrifugation and an equivalent number of shControl and shMsi2 transduced cells were resuspended in full media, mixed at a 1:1 ratio with growth factor reduced Matrigel at a final volume of 100 μL, and transplanted subcutaneously into the flanks of NSG recipient mice.

      Reviewer #3 (Public Review):

      - In Figure 1, characterization of Msi2 expression in the normal mouse lung was carried out by using a Msi2-GFP Knock-in reporter and analyzed by flow cytometry followed by cytospins and immunostaining. Additional characterization of Msi2 expression by co-immunostaining with well-known markers of airway and alveolar cell types in intact lung tissue will strengthen the existing data and provide more specific information about Msi2 expression and abundancy in relevant cell types. It will be also interesting to know whether Msi2 is expressed or not in other abundant lung cell types such as ciliated and AT1 cells.

      We performed co-staining of Msi2 and CC10 as well as Msi2 and SPC in Figure 1C. In the future we can include additional markers as well as markers for airway and other alveolar cell types.

      - While this set of experiments provide strong evidence that Msi2 is required for tumor progression and growth in lung adenocarcinoma, it is unclear whether normal Msi2+ lung cells are more responsive to transformation or whether Msi2 is upregulated early during the process of tumorigenesis. Future lineage tracing experiments using Msi2-CreER and mouse models of chemically-induced lung carcinogenesis will provide additional data that will fully support this claim.

      Recently, we published data showing that Msi2 is expressed in Clara cells at the bronchoalveolar junction in the lung of our new Msi2-CreERT2 knock-in mouse model4. Furthermore, stabilization of the oncogenic MYC protein in these specific cells to model Myc amplification was sufficient to drive the formation of small-cell lung cancer4. These data excitingly demonstrate that Msi2+ cells are more responsive to transformation after Myc stabilization.

      - In Figure 4F, Patient-derived xenograft (PDX) assays were conducted in 2 patients only and the percentage of cells infected by shRNA-Msi2 is low in both PDX (30% and 10% for patient 1 and 2 respectively). It is surprising that Msi2 downregulation in a small percentage of tumor cells has such a dramatic effect on tumor growth and expansion. Confirmation of this finding with additional patient samples would suggest an important non-cell autonomous role for Msi2 in lung adenocarcinoma.

      In the future we hope to collect more patient samples to further validate the data presented with the first 2 patients shown here. We are not certain about the reason behind the large impact of Msi2 inhibition, but as cancer stem cells drive the formation of the rest of the tumor and also drive the stromal microenvironment, it is possible that when Msi2 is deleted, Msi2- cells no longer form tumors? and also the ability to build the stromal microenvironment is impacted. This possibility needs to be further tested in future experiments.

      References

      (1) Ito, T. Kwon, H. Y., Zimdahl, B., Congdon, K. L., Blum, J., Lento, W. E., Zhao, C., Lagoo, A., Gerrard, G., Foroni, L., Goldman, J., Goh, H., Kim, S. H., Kim, D. W., Chuah, C., Oehler, V. G., Radich, J. P., Jordan, C. T., & Reya, T. Regulation of myeloid leukaemia by the cell-fate determinant Musashi. Nature 466, 765–768 (2010).

      (2) Fox, R. G. Lytle, N. K., Jaquish, D. V., Park, F. D., Ito, T., Bajaj, J., Koechlein, C. S., Zimdahl, B., Yano, M., Kopp, J. L., Kritzik, M., Sicklick, J. K., Sander, M., Grandgenett, P. M., Hollingsworth, M. A., Shibata, S., Pizzo, D., Valasek, M. A., Sasik, R., Scadeng, M., Okano, H., Kim, Y., MacLeod, A. R., Lowy, A. M., & Reya, T. Image-based detection and targeting of therapy resistance in pancreatic adenocarcinoma. Nature 534, 407–411 (2016).

      (3) Zhang, H. Tan, S., Wang, J., Chen, S., Quan, J., Xian, J., Zhang, Ss., He, J., & Zhang, L. Musashi2 modulates K562 leukemic cell proliferation and apoptosis involving the MAPK pathway. Exp Cell Res 320, 119-27 (2014).

      (4) Rajbhandari, N., Hamilton, M., Quintero, C.M., Ferguson, L.P., Fox, R., Schürch, C.M., Wang, J., Nakamura, M., Lytle, N.K., McDermott, M., Diaz, E., Pettit, H., Kritzik, M., Han, H., Cridebring, D., Wen, K.W., Tsai, S., Goggins, M.G., Lowy, A.M., Wechsler-Reya, R.J., Von Hoff, D.D., Newman, A.M., & Reya, T. Single-cell mapping identifies MSI+ cells as a common origin for diverse subtypes of pancreatic cancer. Cancer Cell 41(11):1989-2005.e9 (2023).

    1. Author Response

      Reviewer #1 (Public Review):

      1) “It is unclear whether new in vivo experiments were conducted for this study”.

      All in vivo experiments shown were conducted independently by new researchers in the lab, using the original fly stocks. This will be more clearly stated in the revised supplement. The aim of repeating the experiments was to directly compare the consequences of impaired N- and C-terminal shedding side-by-side in two Hh-dependent developmental systems.

      2) “A critical shortcoming of the study is that experiments showing Shh secretion/export do not include a Shh(-) control condition. Without demonstration that the bands analyzed are specific for Shh(+) conditions, these experiments cannot be appropriately evaluated”.

      C9C5 antibody reactivity and specificity is shown below, and this control will be added to the revised manuscript. We established the C9C5 immunoblotting protocol – and generated the blot shown in Author Response Image 1 - before any of the experiments in the manuscript were started. The immunoblot clearly shows Shh specificity similar to that of R&D AF464 anti-Shh antibodies that were previously used in the lab. The immunoblot also shows that both antibodies detect the same Shh signals in media, that C9C5 is more sensitive, and that AF464 and C9C5 detect 5E1-IP’d dual-lipidated and monolipidated soluble Shh equally well. Also note that, in our hands, C9C5 is highly specific: this antibody detects N-truncated C25S;Δ26-35Shh of increased electrophoretic mobility, but does not cause unspecific signals above or below, even if the blot is strongly overexposed (as shown here). Specific Shh detection by C9C5 is also discussed in our response to editor’s comments below.

      Cells were transfected with constructs encoding full-length C25SShh or truncated C25S;Δ26-35Shh, and proteins in serum-containing media were 5E1 immunoprecipitated or concentrated by heparin-sepharose pulldown. Dual-lipidated R&D 8908-SH was dissolved in the same medium and subjected to the same 5E1 immunoprecipitation or heparin pulldown. The blot was incubated with antibody AF464 and (after stripping) with antibody C9C5. Immunoblot analysis revealed high specificity of both antibodies and also revealed poor interactions of dual-lipidated 8908-SH with highly charged heparin.

      3) “A stably expressing Shh/Hhat cell line would reduce condition to condition and experiment to experiment variability”.

      We fully agree with this reviewer and therefore aimed to establish stable Hhat expressing cell lines several years ago. However, stable Hhat expression eliminated transfected cells after several passages, or cells gradually ceased to express Hhat, preventing us to establish a stable line despite several attempts and tried strategies. For this reason, we established transient co-expression of Shh/Hhat from the same mRNA to at least eliminate variability between relative Shh/Hhat expression levels and to assure complete Shh palmitoylation in our assays.

      4) “Unusual normalization strategies are used for many experiments, and quantification/statistical analyses are missing for several experiments”.

      This comment refers to data shown in Fig. 3 (here, no quantification of Scube2 function in Disp-/- cells had been conducted) and to qPCR data shown in Fig. 4 (here, Shh and C25AShh were compared only indirectly via dual-lipidated R&D 8908-SH, but not directly in a side-by-side experiment, and Shh variants with an N-terminal alanine or a serine were directly compared). We agree with the reviewer and therefore currently repeat qPCR assays and quantify blots to eliminate these technical shortcomings from the final manuscript.

      5) “The study provides a modest advance in the understanding of the complex issue of Shh membrane extraction”

      Our investigation identified unexpected links between Disp as a furin-activated Hh exporter, sheddase-mediated Shh release, Scube2-mediated Shh release and lipoprotein-mediated Hh transport – established modes indeed but with no previously established direct connections – that increase their relevance. We also identified a previously unknown N-processed Shh variant attached to lipoproteins and show that Disp/Scube2 function absolutely requires lipoproteins. Therefore, although we do agree that our findings are confirmatory for the above modes, they also provide new mechanistic insight and challenge the currently dominating model of Disp-mediated hand-over of dual-lipidated Hh to Scube2 chaperones (this model does not predict a role for lipoprotein particles but for both Shh lipids in signaling, for a recent discussion, see PMID 36932157). Our findings suggest an answer to the intensely debated question of whether Disp/Ptch extract cholesterol from the outer or inner plasma membrane leaflet, and suggest that N-palmitate is dispensable for signaling of lipoprotein-associated Shh to Ptch receptors. Finally, we note that previous in vivo studies in flies often relied on Hh overexpression in the fat body, raising questions on their physiological relevance. Our in vivo analyses of Hh function in wing- and eye discs are more physiologically relevant and can explain the previously reported presence of non-lipidated bioactive Hh in disc tissue (PMID: 23554573).

      Reviewer #2 (Public Review):

      1) “However, the results concerning the roles of lipoproteins and Shh lipid modifications are largely confirmatory of previous results, and molecular identity/physiological relevance of the newly identified Shh variant remain unclear”.

      Regarding the confirmatory aspects of our work, please also refer to our response to reviewer 1. In addition, we would like to reply that our unbiased experimental approach was designed to challenge the model of Shh shedding by testing whether established Shh release regulators affect it (e.g. support it) or not. As described in our work, Disp, Scube2 and lipoproteins all contribute to increased shedding (which is new), that Disp function depends on lipoprotein presence (also new), and that lipoproteins modify the outcome of Shh shedding (dual Shh shedding versus N-shedding and lipoprotein association), which is also new.

      Regarding physiological relevance, we would like to reply that our finding that artificially generated monolipidated variants (C25SShh and ShhN) solubilize in uncontrolled manner from producing cells can explain previously observed, highly variable gain-of-function or loss-of-function phenotypes upon their overexpression in vivo 1, 2, 3, 4, 5. Our data is also supported by the observed presence of variably lipidated Shh/Hh variants in vivo 6, and the in vivo observation that complete removal of Scube activity in zebrafish embryos phenocopies a complete loss of Hh function that is bypassed by increased ligand expression - and even results in wild-type-like ectopic Shh target gene expression 7. The in vivo observations are compatible with our data but are incompatible with proposed alternative models of Scube-mediated dual-lipidated Shh extraction and continued Shh/Scube association to allow for morphogen transport.

      2) “Thus, it would be important to demonstrate key findings in cells that secrete Shh endogenously”.

      Experimental data shown in Fig. S8B demonstrates that en-controlled expression of sheddase-resistant Hh variants blocks endogenous Hh function in the same wing disc compartment. To our knowledge, this assay is the most physiologically relevant test of the mechanism of Disp-mediated Hh release. Still, we have now started to analyze Hh from Drosophila disc tissue biochemically and hope that we can include our findings in the final manuscript.

      3) “The authors could use an orthogonal approach, optimally a demonstration of physical interaction, or at least fractionation by a different parameter”.

      We agree with this reviewer’s assessment and are currently in the process to establish co-IP and density gradient conditions to test physical HDL/Shh interactions. The results will be included in the final version of record.

    1. Author Response

      eLife assessment

      This study presents potentially valuable results on glutamine-rich motifs in relation to protein expression and alternative genetic codes. The author's interpretation of the results is so far only supported by incomplete evidence, due to a lack of acknowledgment of alternative explanations, missing controls and statistical analysis and writing unclear to non experts in the field. These shortcomings could be at least partially overcome by additional experiments, thorough rewriting, or both.

      We thank both the Reviewing Editor and Senior Editor for handling this manuscript and will submit our revised manuscript after the reviewed preprint is published by eLife.  

      Reviewer #1 (Public Review):

      Summary

      This work contains 3 sections. The first section describes how protein domains with SQ motifs can increase the abundance of a lacZ reporter in yeast. The authors call this phenomenon autonomous protein expression-enhancing activity, and this finding is well supported. The authors show evidence that this increase in protein abundance and enzymatic activity is not due to changes in plasmid copy number or mRNA abundance, and that this phenomenon is not affected by mutants in translational quality control. It was not completely clear whether the increased protein abundance is due to increased translation or to increased protein stability.

      In section 2, the authors performed mutagenesis of three N-terminal domains to study how protein sequence changes protein stability and enzymatic activity of the fusions. These data are very interesting, but this section needs more interpretation. It is not clear if the effect is due to the number of S/T/Q/N amino acids or due to the number of phosphorylation sites.

      In section 3, the authors undertake an extensive computational analysis of amino acid runs in 27 species. Many aspects of this section are fascinating to an expert reader. They identify regions with poly-X tracks. These data were not normalized correctly: I think that a null expectation for how often poly-X track occur should be built for each species based on the underlying prevalence of amino acids in that species. As a result, I believe that the claim is not well supported by the data.

      Strengths

      This work is about an interesting topic and contains stimulating bioinformatics analysis. The first two sections, where the authors investigate how S/T/Q/N abundance modulates protein expression level, is well supported by the data. The bioinformatics analysis of Q abundance in ciliate proteomes is fascinating. There are some ciliates that have repurposed stop codons to code for Q. The authors find that in these proteomes, Q-runs are greatly expanded. They offer interesting speculations on how this expansion might impact protein function.

      Weakness

      At this time, the manuscript is disorganized and difficult to read. An expert in the field, who will not be distracted by the disorganization, will find some very interesting results included. In particular, the order of the introduction does not match the rest of the paper.

      In the first and second sections, where the authors investigate how S/T/Q/N abundance modulates protein expression levels, it is unclear if the effect is due to the number of phosphorylation sites or the number of S/T/Q/N residues.

      There are three reasons why the number of phosphorylation sites in the Q-rich motifs is not relevant to their autonomous protein expression-enhancing (PEE) activities:

      First, we have reported previously that phosphorylation-defective Rad51-NTD (Rad51-3SA) and wild-type Rad51-NTD exhibit similar autonomous PEE activity. Mec1/Tel1-dependent phosphorylation of Rad51-NTD antagonizes the proteasomal degradation pathway, increasing the half-life of Rad51 from ∼30 min to ≥180 min (Ref 27; Woo, T. T. et al. 2020).

      1. T. T. Woo, C. N. Chuang, M. Higashide, A. Shinohara, T. F. Wang, Dual roles of yeast Rad51 N-terminal domain in repairing DNA double-strand breaks. Nucleic Acids Res 48, 8474-8489 (2020).

      Second, in our preprint manuscript, we have also shown that phosphorylation-defective Rad53-SCD1 (Rad51-SCD1-5STA) also exhibits autonomous PEE activity similar to that of wild-type Rad53-SCD (Figure 2D, Figure 4A and Figure 4C).

      Third, as revealed by the results of our preprint manuscript (Figure 4), it is the percentages, and not the numbers, of S/T/Q/N residues that are correlated with the PEE activities of Q-rich motifs.

      The authors also do not discuss if the N-end rule for protein stability applies to the lacZ reporter or the fusion proteins.

      The autonomous PEE function of S/T/Q-rich NTDs is unlikely to be relevant to the N-end rule. The N-end rule links the in vivo half-life of a protein to the identity of its N-terminal residues. In S. cerevisiae, the N-end rule operates as part of the ubiquitin system and comprises two pathways. First, the Arg/N-end rule pathway, involving a single N-terminal amidohydrolase Nta1, mediates deamidation of N-terminal asparagine (N) and glutamine (Q) into aspartate (D) and glutamate (E), which in turn are arginylated by a single Ate1 R-transferase, generating the Arg/N degron. N-terminal R and other primary degrons are recognized by a single N-recognin Ubr1 in concert with ubiquitin-conjugating Ubc2/Rad6. Ubr1 can also recognize several other N-terminal residues, including lysine (K), histidine (H), phenylalanine (F), tryptophan (W), leucine (L) and isoleucine (I) (Bachmair, A. et al. 1986; Tasaki, T. et al. 2012; Varshavshy, A. et al. 2019). Second, the Ac/N-end rule pathway targets proteins containing N-terminally acetylated (Ac) residues. Prior to acetylation, the first amino acid methionine (M) is catalytically removed by Met-aminopeptides, unless a residue at position 2 is non-permissive (too large) for MetAPs. If a retained N-terminal M or otherwise a valine (V), cysteine (C), alanine (A), serine (S) or threonine (T) residue is followed by residues that allow N-terminal acetylation, the proteins containing these AcN degrons are targeted for ubiquitylation and proteasome-mediated degradation by the Doa10 E3 ligase (Hwang, C. S., 2019).

      A. Bachmair, D. Finley, A. Varshavsky, In vivo half-life of a protein is a function of its amino-terminal residue. Science 234, 179-186 (1986).

      T. Tasaki, S. M. Sriram, K. S. Park, Y. T. Kwon, The N-end rule pathway. Annu Rev Biochem 81, 261-289 (2012).

      A. Varshavsky, N-degron and C-degron pathways of protein degradation. Proc Natl Acad Sci 116, 358-366 (2019).

      C. S. Hwang, A. Shemorry, D. Auerbach, A. Varshavsky, The N-end rule pathway is mediated by a complex of the RING-type Ubr1 and HECT-type Ufd4 ubiquitin ligases. Nat Cell Biol 12, 1177-1185 (2010).

      The PEE activities of these S/T/Q-rich domains are unlikely to arise from counteracting the N-end rule for two reasons. First, the first two amino acid residues of Rad51-NTD, Hop1-SCD, Rad53-SCD1, Sup35-PND, Rad51-ΔN, and LacZ-NVH are MS, ME, ME, MS, ME, and MI, respectively, where M is methionine, S is serine, E is glutamic acid and I is isoleucine. Second, Sml1-NTD behaves similarly to these N-terminal fusion tags, despite its methionine and glutamine (MQ) amino acid signature at the N-terminus.

      The most interesting part of the paper is an exploration of S/T/Q/N-rich regions and other repetitive AA runs in 27 proteomes, particularly ciliates. However, this analysis is missing a critical control that makes it nearly impossible to evaluate the importance of the findings. The authors find the abundance of different amino acid runs in various proteomes. They also report the background abundance of each amino acid. They do not use this background abundance to normalize the runs of amino acids to create a null expectation from each proteome. For example, it has been clear for some time (Ruff, 2017; Ruff et al., 2016) that Drosophila contains a very high background of Q's in the proteome and it is necessary to control for this background abundance when finding runs of Q's.

      We apologize for not explaining sufficiently well the topic eliciting this reviewer’s concern in our preprint manuscript. In the second paragraph of page 14, we cite six references to highlight that SCDs are overrepresented in yeast and human proteins involved in several biological processes (32, 74), and that polyX prevalence differs among species (43, 75-77).

      1. Cheung HC, San Lucas FA, Hicks S, Chang K, Bertuch AA, Ribes-Zamora A. An S/T-Q cluster domain census unveils new putative targets under Tel1/Mec1 control. BMC Genomics. 2012;13:664.

      2. Mier P, Elena-Real C, Urbanek A, Bernado P, Andrade-Navarro MA. The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context. Comput Struct Biotechnol J. 2020;18:306-13.

      3. Cara L, Baitemirova M, Follis J, Larios-Sanz M, Ribes-Zamora A. The ATM- and ATR-related SCD domain is over-represented in proteins involved in nervous system development. Sci Rep. 2016;6:19050.

      4. Kuspa A, Loomis WF. The genome of Dictyostelium discoideum. Methods Mol Biol. 2006;346:15-30.

      5. Davies HM, Nofal SD, McLaughlin EJ, Osborne AR. Repetitive sequences in malaria parasite proteins. FEMS Microbiol Rev. 2017;41(6):923-40.

      6. Mier P, Alanis-Lobato G, Andrade-Navarro MA. Context characterization of amino acid homorepeats using evolution, position, and order. Proteins. 2017;85(4):709-19.

      We will cite the two references by Kiersten M. Ruff in our revised manuscript.

      K. M. Ruff and R. V. Pappu, (2015) Multiscale simulation provides mechanistic insights into the effects of sequence contexts of early-stage polyglutamine-mediated aggregation. Biophysical Journal 108, 495a.

      K. M. Ruff, J. B. Warner, A. Posey and P. S. Tan (2017) Polyglutamine length dependent structural properties and phase behavior of huntingtin exon1. Biophysical Journal 112, 511a.

      The authors could easily address this problem with the data and analysis they have already collected. However, at this time, without this normalization, I am hesitant to trust the lists of proteins with long runs of amino acid and the ensuing GO enrichment analysis.

      Ruff KM. 2017. Washington University in St.

      Ruff KM, Holehouse AS, Richardson MGO, Pappu RV. 2016. Proteomic and Biophysical Analysis of Polar Tracts. Biophys J 110:556a.

      We thank Reviewer #1 for this helpful suggestion and now address this issue by means of a different approach described below.

      Based on a previous study (43; Palo Mier et al. 2020), we applied seven different thresholds to seek both short and long, as well as pure and impure, polyX strings in 20 different representative near-complete proteomes, including 4X (4/4), 5X (4/5-5/5), 6X (4/6-6/6), 7X (4/7-7/7), 8-10X (≥50%X), 11-10X (≥50%X) and ≥21X (≥50%X).

      To normalize the runs of amino acids and create a null expectation from each proteome, we determined the ratios of the overall number of X residues for each of the seven polyX motifs relative to those in the entire proteome of each species, respectively. The results of four different polyX motifs are shown below, i.e., polyQ (Author response image 1), polyN (Author response image 2), polyS (Author response image 3) and polyT (Author response image 4).

      Author response image 1.

      Q contents in 7 different types of polyQ motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.  

      Author response image 2.

      N contents in 7 different types of polyN motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

      Author response image 3.

      S contents in 7 different types of polyS motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.  

      Author response image 4.

      T contents in 7 different types of polyT motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

      The results summarized in these four new figures support that polyX prevalence differs among species and that the overall X contents of polyX motifs often but not always correlate with the X usage frequency in entire proteomes (43; Palo Mier et al. 2020).

      Most importantly, our results reveal that, compared to Stentor coeruleus or several non-ciliate eukaryotic organisms (e.g., Plasmodium falciparum, Caenorhabditis elegans, Danio rerio, Mus musculus and Homo sapiens), the five ciliates with reassigned TAAQ and TAGQ codons not only have higher Q usage frequencies, but also more polyQ motifs in their proteomes (Figure 1). In contrast, polyQ motifs prevail in Candida albicans, Candida tropicalis, Dictyostelium discoideum, Chlamydomonas reinhardtii, Drosophila melanogaster and Aedes aegypti, though the Q usage frequencies in their entire proteomes are not significantly higher than those of other eukaryotes (Figure 1). Due to their higher N usage frequencies, Dictyostelium discoideum, Plasmodium falciparum and Pseudocohnilembus persalinus have more polyN motifs than the other 23 eukaryotes we examined here (Figure 2). Generally speaking, all 26 eukaryotes we assessed have similar S usage frequencies and percentages of S contents in polyS motifs (Figure 3). Among these 26 eukaryotes, Dictyostelium discoideum possesses many more polyT motifs, though its T usage frequency is similar to that of the other 25 eukaryotes (Figure 4).

      In conclusion, these new normalized results confirm that the reassignment of stop codons to Q indeed results in both higher Q usage frequencies and more polyQ motifs in ciliates.  

      Reviewer #2 (Public Review):

      Summary:

      This study seeks to understand the connection between protein sequence and function in disordered regions enriched in polar amino acids (specifically Q, N, S and T). While the authors suggest that specific motifs facilitate protein-enhancing activities, their findings are correlative, and the evidence is incomplete. Similarly, the authors propose that the re-assignment of stop codons to glutamine-encoding codons underlies the greater user of glutamine in a subset of ciliates, but again, the conclusions here are, at best, correlative. The authors perform extensive bioinformatic analysis, with detailed (albeit somewhat ad hoc) discussion on a number of proteins. Overall, the results presented here are interesting, but are unable to exclude competing hypotheses.

      Strengths:

      Following up on previous work, the authors wish to uncover a mechanism associated with poly-Q and SCD motifs explaining proposed protein expression-enhancing activities. They note that these motifs often occur IDRs and hypothesize that structural plasticity could be capitalized upon as a mechanism of diversification in evolution. To investigate this further, they employ bioinformatics to investigate the sequence features of proteomes of 27 eukaryotes. They deepen their sequence space exploration uncovering sub-phylum-specific features associated with species in which a stop-codon substitution has occurred. The authors propose this stop-codon substitution underlies an expansion of ploy-Q repeats and increased glutamine distribution.

      Weaknesses:

      The preprint provides extensive, detailed, and entirely unnecessary background information throughout, hampering reading and making it difficult to understand the ideas being proposed. The introduction provides a large amount of detailed background that appears entirely irrelevant for the paper. Many places detailed discussions on specific proteins that are likely of interest to the authors occur, yet without context, this does not enhance the paper for the reader.

      The paper uses many unnecessary, new, or redefined acronyms which makes reading difficult. As examples:

      (1) Prion forming domains (PFDs). Do the authors mean prion-like domains (PLDs), an established term with an empirical definition from the PLAAC algorithm? If yes, they should say this. If not, they must define what a prion-forming domain is formally.

      The N-terminal domain (1-123 amino acids) of S. cerevisiae Sup35 was already referred to as a “prion forming domain (PFD)” in 2006 (Tuite, M. F. 2006). Since then, PFD has also been employed as an acronym in other yeast prion papers (Cox, B.S. et al. 2007; Toombs, T. et al. 2011).

      M. F., Tuite, Yeast prions and their prion forming domain. Cell 27, 397-407 (2005).

      B. S. Cox, L. Byrne, M. F., Tuite, Protein Stability. Prion 1, 170-178 (2007).

      J. A. Toombs, N. M. Liss, K. R. Cobble, Z. Ben-Musa, E. D. Ross, [PSI+] maintenance is dependent on the composition, not primary sequence, of the oligopeptide repeat domain. PLoS One 6, e21953 (2011).

      (2) SCD is already an acronym in the IDP field (meaning sequence charge decoration) - the authors should avoid this as their chosen acronym for Serine(S) / threonine (T)-glutamine (Q) cluster domains. Moreover, do we really need another acronym here (we do not).

      SCD was first used in 2005 as an acronym for the Serine (S)/threonine (T)-glutamine (Q) cluster domain in the DNA damage checkpoint field (Traven, A. and Heierhorst, J. 2005). Almost a decade later, SCD became an acronym for “sequence charge decoration” (Sawle, L. et al. 2015; Firman, T. et al. 2018).

      A. Traven and J, Heierhorst, SQ/TQ cluster domains: concentrated ATM/ATR kinase phosphorylation site regions in DNA-damage-response proteins. Bioessays. 27, 397-407 (2005).

      L. Sawle and K, Ghosh, A theoretical method to compute sequence dependent configurational properties in charged polymers and proteins. J. Chem Phys. 143, 085101(2015).

      T. Firman and Ghosh, K. Sequence charge decoration dictates coil-globule transition in intrinsically disordered proteins. J. Chem Phys. 148, 123305 (2018).

      (3) Protein expression-enhancing (PEE) - just say expression-enhancing, there is no need for an acronym here.

      Thank you. Since we have shown that addition of Q-rich motifs to LacZ affects protein expression rather than transcription, we think it is better to use the “PEE” acronym.

      The results suggest autonomous protein expression-enhancing activities of regions of multiple proteins containing Q-rich and SCD motifs. Their definition of expression-enhancing activities is vague and the evidence they provide to support the claim is weak. While their previous work may support their claim with more evidence, it should be explained in more detail. The assay they choose is a fusion reporter measuring beta-galactosidase activity and tracking expression levels. Given the presented data they have shown that they can drive the expression of their reporters and that beta gal remains active, in addition to the increase in expression of fusion reporter during the stress response. They have not detailed what their control and mock treatment is, which makes complete understanding of their experimental approach difficult. Furthermore, their nuclear localization signal on the tag could be influencing the degradation kinetics or sequestering the reporter, leading to its accumulation and the appearance of enhanced expression. Their evidence refuting ubiquitin-mediated degradation does not have a convincing control.

      Based on the experimental results, the authors then go on to perform bioinformatic analysis of SCD proteins and polyX proteins. Unfortunately, there is no clear hypothesis for what is being tested; there is a vague sense of investigating polyX/SCD regions, but I did not find the connection between the first and section compelling (especially given polar-rich regions have been shown to engage in many different functions). As such, this bioinformatic analysis largely presents as many lists of percentages without any meaningful interpretation. The bioinformatics analysis lacks any kind of rigorous statistical tests, making it difficult to evaluate the conclusions drawn. The methods section is severely lacking. Specifically, many of the methods require the reader to read many other papers. While referencing prior work is of course, important, the authors should ensure the methods in this paper provide the details needed to allow a reader to evaluate the work being presented. As it stands, this is not the case.

      Thank you. As described in detail below, we have now performed rigorous statistical testing using the GofuncR package.

      Overall, my major concern with this work is that the authors make two central claims in this paper (as per the Discussion). The authors claim that Q-rich motifs enhance protein expression. The implication here is that Q-rich motif IDRs are special, but this is not tested. As such, they cannot exclude the competing hypothesis ("N-terminal disordered regions enhance expression").

      In fact, “N-terminal disordered regions enhance expression” exactly summarizes our hypothesis.

      On pages 12-13 and Figure 4 of our preprint manuscript, we explained our hypothesis in the paragraph entitled “The relationship between PEE function, amino acid contents, and structural flexibility”.

      The authors also do not explore the possibility that this effect is in part/entirely driven by mRNA-level effects (see Verma Na Comms 2019).

      As pointed out by the first reviewer, we show evidence that the increase in protein abundance and enzymatic activity is not due to changes in plasmid copy number or mRNA abundance (Figure 2), and that this phenomenon is not affected by translational quality control mutants (Figure 3).

      As such, while these observations are interesting, they feel preliminary and, in my opinion, cannot be used to draw hard conclusions on how N-terminal IDR sequence features influence protein expression. This does not mean the authors are necessarily wrong, but from the data presented here, I do not believe strong conclusions can be drawn. That re-assignment of stop codons to Q increases proteome-wide Q usage. I was unable to understand what result led the authors to this conclusion.

      My reading of the results is that a subset of ciliates has re-assigned UAA and UAG from the stop codon to Q. Those ciliates have more polyQ-containing proteins. However, they also have more polyN-containing proteins and proteins enriched in S/T-Q clusters. Surely if this were a stop-codon-dependent effect, we'd ONLY see an enhancement in Q-richness, not a corresponding enhancement in all polar-rich IDR frequencies? It seems the better working hypothesis is that free-floating climate proteomes are enriched in polar amino acids compared to sessile ciliates.

      Thank you. These comments are not supported by the results in Figure 1.

      Regardless, the absence of any kind of statistical analysis makes it hard to draw strong conclusions here.

      We apologize for not explaining more clearly the results of Tables 5-7 in our preprint manuscript.

      To address the concerns about our GO enrichment analysis by both reviewers, we have now performed rigorous statistical testing for SCD and polyQ protein overrepresentation using the GOfuncR package (https://bioconductor.org/packages/release/bioc/html/GOfuncR.html). GOfuncR is an R package program that conducts standard candidate vs. background enrichment analysis by means of the hypergeometric test. We then adjusted the raw p-values according to the Family-wise error rate (FWER). The same method had been applied to GO enrichment analysis of human genomes (Huttenhower, C., et al. 2009).

      Curtis Huttenhower, C., Haley, E. M., Hibbs, M., A., Dumeaux, V., Barrett, D. R., Hilary A. Coller, H. A., and Olga G. Troyanskaya, O., G. Exploring the human genome with functional maps, Genome Research 19, 1093-1106 (2009).

      The results presented in Author response image 5 and Author response image 6 support our hypothesis that Q-rich motifs prevail in proteins involved in specialized biological processes, including Saccharomyces cerevisiae RNA-mediated transposition, Candida albicans filamentous growth, peptidyl-glutamic acid modification in ciliates with reassigned stop codons (TAAQ and TAGQ), Tetrahymena thermophila xylan catabolism, Dictyostelium discoideum sexual reproduction, Plasmodium falciparum infection, as well as the nervous systems of Drosophila melanogaster, Mus musculus, and Homo sapiens (74). In contrast, peptidyl-glutamic acid modification and microtubule-based movement are not overrepresented with Q-rich proteins in Stentor coeruleus, a ciliate with standard stop codons.

      1. Cara L, Baitemirova M, Follis J, Larios-Sanz M, Ribes-Zamora A. The ATM- and ATR-related SCD domain is over-represented in proteins involved in nervous system development. Sci Rep. 2016;6:19050.

      Author response image 5.

      Selection of biological processes with overrepresented SCD-containing proteins in different eukaryotes. The percentages and number of SCD-containing proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower, C., et al. 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The five ciliates with reassigned stop codons (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

      Author response image 6.

      Selection of biological processes with overrepresented polyQ-containing proteins in different eukaryotes. The percentages and numbers of polyQ-containing proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower, C., et al. 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The five ciliates with reassigned stops codons (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

    1. Author Response

      Reviewer #1 (Public Review):

      Wang and all present an interesting body of work focused on the effects of high altitude and hypoxia on erythropoiesis, resulting in erythrocytosis. This work is specifically focused on the spleen, identifying splenic macrophages as central cells in this effect. This is logical since these cells are involved in erythrophagocytosis and iron recycling. The results suggest that hypoxia induces splenomegaly with decreased number of splenic macrophages. There is also evidence that ferroptosis is induced in these macrophages, leading to cell destruction. Finally, the data suggest that ferroptosis in splenic red pulp macrophages causes the decrease in RBC clearance, resulting in erythrocytosis aka lengthening the RBC lifespan. However, there are many issues with the presented results, with somewhat superficial data, meaning the conclusions are overstated and there is decreased confidence that the hypotheses and observed results are directly causally related to hypoxia.

      Major points:

      1) The spleen is a relatively poorly understood organ but what is known about its role in erythropoiesis especially in mice is that it functions both to clear as well as to generate RBCs. The later process is termed extramedullary hematopoiesis and can occur in other bones beyond the pelvis, liver, and spleen. In mice, the spleen is the main organ of extramedullary erythropoiesis. The finding of transiently decreased spleen size prior to splenomegaly under hypoxic conditions is interesting but not well developed in the manuscript. This is a shortcoming as this is an opportunity to evaluate the immediate effect of hypoxia separately from its more chronic effect. Based just on spleen size, no conclusions can be drawn about what happens in the spleen in response to hypoxia.

      Thank you for your insightful comments and questions. The spleen is instrumental in both immune response and the clearance of erythrocytes, as well as serving as a significant reservoir of blood in the body. This organ, characterized by its high perfusion rate and pliability, constricts under conditions of intense stress, such as during peak physical exertion, the diving reflex, or protracted periods of apnea. This contraction can trigger an immediate release of red blood cells (RBCs) into the bloodstream in instances of substantial blood loss or significant reduction of RBCs. Moreover, elevated oxygen consumption rates in certain animal species can be partially attributed to splenic contractions, which augment hematocrit levels and the overall volume of circulating blood, thereby enhancing venous return and oxygen delivery (Dane et al. J Appl Physiol, 2006, 101:289-97; Longhurst et al. Am J Physiol, 1986, 251: H502-9). In our investigation, we noted a significant contraction of the spleen following exposure to hypoxia for a period of one day. We hypothesized that the body, under such conditions, is incapable of generating sufficient RBCs promptly enough to facilitate enhanced oxygen delivery. Consequently, the spleen reacts by releasing its stored RBCs through splenic constriction, leading to a measurable reduction in spleen size.

      However, we agree with you that further investigation is required to fully understand the implications of these changes. Considering the comments, we propose to extend our research by incorporating more detailed examinations of spleen morphology and function during hypoxia, including the potential impact on extramedullary hematopoiesis. We anticipate that such an expanded analysis would not only help elucidate the initial response to hypoxia but also provide insights into the more chronic effects of this condition on spleen function and erythropoiesis.

      2) Monocyte repopulation of tissue resident macrophages is a minor component of the process being described and it is surprising that monocytes in the bone marrow and spleen are also decreased. Can the authors conjecture why this is happening? Typically, the expectation would be that a decrease in tissue resident macrophages would be accompanied by an increase in monocyte migration into the organ in a compensatory manner.

      We appreciate your insightful query regarding the observed decrease in monocytes in the bone marrow and spleen, particularly considering the typical compensatory increase in monocyte migration into organs following a decrease in tissue resident macrophages.

      The observed decrease in monocytes within the bone marrow is likely attributable to the fact that monocytes and precursor cells for red blood cells (RBCs) both originate from the same hematopoietic stem cells within the bone marrow. It is well established that exposure to hypobaric hypoxia (HH) induces erythroid differentiation specifically within the bone marrow, originating from these hematopoietic stem cells. As such, we postulate that the differentiation into monocytes is reduced under hypoxic conditions, which may subsequently cause a decrease in migration to the spleen.

      Furthermore, we hypothesize that an increased migration of monocytes to other tissues under HH exposure may also contribute to the decreased migration to the spleen. The liver, which partially contributes to the clearance of RBCs, may play a role in this process. Our investigations to date have indeed identified an increased monocyte migration to the liver. We were pleased to discover an elevation in CSF1 expression in the liver following HH exposure for both 7 and 14 days. This finding was corroborated through flow cytometry, which confirmed an increase in monocyte migration to the liver.

      Consequently, we propose that under HH conditions, the liver requires an increased influx of monocytes, which in turn leads to a decrease in monocyte migration to the spleen. However, it is important to note that these findings will be discussed more comprehensively in our forthcoming publication, and as such, the data pertaining to these results have not been included in the current manuscript.

      3) Figure 3 does not definitively provide evidence that cell death is specifically occurring in splenic macrophages and the fraction of Cd11b+ cells is not changed in NN vs HH. Furthermore, the IHC of F4/80 in Fig 3U is not definitive as cells can express F4/80 more or less brightly and no negative/positive controls are shown for this panel.

      We appreciate your insightful comments and critiques regarding Figure 3. We acknowledge that the figure, as presented, does not definitively demonstrate that cell death is specifically occurring in splenic macrophages. While it is challenging to definitively determine the occurrence of cell death in macrophages based solely on Figure 3D-F, our single-cell analysis provides strong evidence that such an event occurs. We initially observed cell death within the spleen under hypobaric hypoxia (HH) conditions, and to discern the precise cell type involved, we conducted single-cell analyses. Regrettably, we did not articulate this clearly in our preliminary manuscript. In the revised version, we have modified the sequence of Figure 3A-C and Figure 3D-F for better clarity. Besides, we observed a significant decrease in the fraction of F4/80hiCD11bhi macrophages under HH conditions compared to NN. To make the changes more evident in CD86 and CD206, we have transformed these scatter plots into histograms in our revised manuscript.

      Considering the limitations of F4/80 as a conclusive macrophage identifier, we have concurrently presented the immunohistochemical (IHC) analyses of heme oxygenase-1 (HO-1). Functioning as a macrophage marker, particularly in cells involved in iron metabolism, HO-1 offers additional diagnostic accuracy. Observations from both F4/80 and HO-1 staining suggested a primary localization of positively stained cells within the splenic red pulp. Following exposure to hypoxia-hyperoxia (HH) conditions, a decrease was noted in the expression of both F4/80 and HO-1. This decrease implies that HH conditions contribute to a reduction in macrophage population and impede the iron metabolism process. In the revised version of our manuscript, we have enhanced the clarity of Figure 3U to illustrate the presence of positive staining, with an emphasis on HO-1 staining, which is predominantly observed in the red pulp.

      4) The phagocytic function of splenic red pulp macrophages relative to infection cannot be used directly to understand erythrophagocytosis. The standard approach is to use opsonized RBCs in vitro. Furthermore, RBC survival is a standard method to assess erythrophagocytosis function. In this method, biotin is injected via tail vein directly and small blood samples are collected to measure the clearance of biotinilation by flow; kits are available to accomplish this. Because the method is standard, Fig 4D is not necessary and Fig 4E needs to be performed only in blood by sampling mice repeatedly and comparing the rate of biotin decline in HH with NN (not comparing 7 d with 14 d).

      We appreciate your insightful comments and suggestions. We concur that the phagocytic function of splenic red pulp macrophages in the context of infection may not be directly translatable to understanding erythrophagocytosis. Given our assessment that the use of cy5.5-labeled E.coli alone may not be sufficient to accurately evaluate the phagocytic function of macrophages, we extended our study to include the use of NHS-biotin-labeled RBCs to assess phagocytic capabilities. While the presence of biotin-labeled RBCs in the blood could provide an indication of RBC clearance, this measure does not exclusively reflect the spleen's role in the process, as it fails to account for the clearance activities of other organs.

      Consequently, we propose that the remaining biotin-labeled RBCs in the spleen may provide a more direct representation of the organ's function in RBC clearance and sequestration. Our observations of diminished erythrophagocytosis at both 7 and 14 days following exposure to HH guided our subsequent efforts to quantify biotin-labeled RBCs in both the circulatory system and spleen. These measurements were conducted during the 7 to 14-day span following the confirmation of impaired erythrophagocytosis. Comparative evaluation of RBC clearance rates under NN and HH conditions provided further evidence supporting our preliminary observations, with the data revealing a decrease in the RBC clearance rate in the context of HH conditions. In response to feedback from other reviewers, we have elected to exclude the phagocytic results and the diagram of the erythrocyte labeling assay. These amendments will be incorporated into the revised manuscript. The reviewers' constructive feedback has played a crucial role in refining the methodological precision and coherence of our investigation.

      5) It is unclear whether Tuftsin has a specific effect on phagocytosis of RBCs without other potential confounding effects. Furthermore, quantifying iron in red pulp splenic macrophages requires alternative readily available more quantitative methods (e.g. sorted red pulp macrophages non-heme iron concentration).

      We appreciate your comments and questions regarding the potential effect of Tuftsin on the phagocytosis of RBCs and the quantification of iron in red pulp splenic macrophages. Regarding the role of Tuftsin, we concur that the literature directly associating Tuftsin with erythrophagocytosis is scant. The work of Gino Roberto Corazza et al. does suggest a link between Tuftsin and general phagocytic capacity, but it does not specifically address erythrophagocytosis (Am J Gastroenterol, 1999;94:391-397). We agree that further investigations are required to elucidate the potential confounding effects and to ascertain whether Tuftsin has a specific impact on the phagocytosis of RBCs. Concerning the quantification of iron in red pulp splenic macrophages, we acknowledge your suggestion to employ readily available and more quantitative methods. We have incorporated additional Fe2+ staining in the spleen at two time points: 7 and 14 days subsequent to HH exposure (refer to the following Figure). The resultant data reveal an escalated deposition of Fe2+ within the red pulp, as evidenced in Figures 5 (panels L and M) and Figure 7 (panels L and M).

      6) In Fig 5, PBMCs are not thought to represent splenic macrophages and although of some interest, does not contribute significantly to the conclusions regarding splenic macrophages at the heart of the current work. The data is also in the wrong direction, namely providing evidence that PBMCs are relatively iron poor which is not consistent with ferroptosis which would increase cellular iron.

      We appreciate your insightful critique regarding Figure 5 and the interpretation of our data on peripheral blood mononuclear cells (PBMCs) in relation to splenic macrophages. We understand that PBMCs do not directly represent splenic macrophages, and we agree that any conclusions drawn from PBMCs must be considered with caution when discussing the behavior of splenic macrophages.

      The primary rationale for incorporating PBMCs into our study was to investigate the potential correspondence between their gene expression changes and those observed in the spleen after HH exposure. This was posited as a working hypothesis for further exploration rather than a conclusive statement. The gene expression in PBMCs was congruous with changes in the spleen's gene expression, demonstrating an iron deficiency phenotype, ostensibly due to the mobilization of intracellular iron for hemoglobin synthesis. Thus, it is plausible that NCOA4 may facilitate iron mobilization through the degradation of ferritin to store iron.

      It remains ambiguous whether ferroptosis was initiated in the PBMCs during our study. Ferroptosis primarily occurs as a response to an increase in Fe2+ rather than an overall increase in intracellular iron. Our preliminary proposition was that relative changes in gene expression in PBMCs could potentially mirror corresponding changes in protein expression in the spleen, thereby potentially indicating alterations in iron processing capacity post-HH exposure. However, we fully acknowledge that this is a conjecture requiring further empirical substantiation or clinical validation.

      7) Tfr1 increase is typically correlated with cellular iron deficiency while ferroptosis consistent with iron loading. The direction of the changes in multiple elements relevant to iron trafficking is somewhat confusing and without additional evidence, there is little confidence that the authors have reached the correct conclusion. Furthermore, the results here are analyses of total spleen samples rather than specific cells in the spleen.

      We appreciate your astute comments and agree that the observed increase in transferrin receptor (TfR) expression, typically associated with cellular iron deficiency, appears contradictory to the expected iron-loading state associated with ferroptosis. We understand that this apparent contradiction might engender some uncertainty about our conclusions.

      In our investigation, we evaluated total spleen samples as opposed to distinct cell types within the spleen, a factor that could have contributed to the seemingly discordant findings. An integral element to bear in mind is the existence of immature RBCs in the spleen, particularly within the hematopoietic island where these immature RBCs cluster around nurse macrophages. These immature RBCs contain abundant TfR which was needed for iron uptake and hemoglobin synthesis. These cells, which prove challenging to eliminate via perfusion, might have played a role in the observed upregulation in TfR expression, especially in the aftermath of HH exposure. Our further research revealed that the expression of TfR in macrophages diminished following hypoxic conditions, thereby suggesting that the elevated TfR expression in tissue samples may predominantly originate from other cell types, especially immature RBCs (refer to subsequent Figure).

      Reviewer #2 (Public Review):

      The authors aimed at elucidating the development of high altitude polycythemia which affects mice and men staying in the hypoxic atmosphere at high altitude (hypobaric hypoxia; HH). HH causes increased erythropoietin production which stimulates the production of red blood cells. The authors hypothesize that increased production is only partially responsible for exaggerated red blood cell production, i.e. polycythemia, but that decreased erythrophagocytosis in the spleen contributes to high red blood cells counts.

      The main strength of the study is the use of a mouse model exposed to HH in a hypobaric chamber. However, not all of the reported results are convincing due to some smaller effects which one may doubt to result in the overall increase in red blood cells as claimed by the authors. Moreover, direct proof for reduced erythrophagocytosis is compromised due to a strong spontaneous loss of labelled red blood cells, although effects of labelled E. coli phagocytosis are shown. Their discussion addresses some of the unexpected results, such as the reduced expression of HO-1 under hypoxia but due to the above-mentioned limitations much of the discussion remains hypothetical.

      Thank you for your valuable feedback and insight. We appreciate the recognition of the strength of our study model, the exposure of mice to hypobaric hypoxia (HH) in a hypobaric animal chamber. We also understand your concerns about the smaller effects and their potential impact on the overall increase in red blood cells (RBCs), as well as the apparent reduced erythrophagocytosis due to the loss of labelled RBCs.

      Erythropoiesis has been predominantly attributed to the amplified production of RBCs under conditions of HH. The focus of our research was to underscore the potential acceleration of hypoxia-associated polycythemia (HAPC) as a result of compromised erythrophagocytosis. Considering the spontaneous loss of labelled RBCs in vivo, we assessed the clearance rate of RBCs at the stages of 7 and 14 days within the HH environment, and subsequently compared this rate within the period from 7 to 14 days following the clear manifestation of erythrophagocytosis impairment at the two aforementioned points identified in our study. This approach was designed to negate the effects of spontaneous loss of labelled RBCs in both NN and HH conditions. Correspondingly, the results derived from blood and spleen analyses corroborated a decline in the RBC clearance rate under HH when juxtaposed with NN conditions.

      Apart from the E. coli phagocytosis and the labeled RBCs experiment (this part of the results was removed in the revision), the injection of Tuftsin further substantiated the impairment of erythrophagocytosis in the HH spleen, as evidenced by the observed decrease in iron within the red pulp of the spleen post-perfusion. Furthermore, to validate our findings, we incorporated RBCs staining in splenic cells at 7 and 14 days of HH exposure, which provided concrete confirmation of impaired erythrophagocytosis (new Figure 4E).

      As for the reduced expression of heme oxygenase-1 (HO-1) under hypoxia, we agree that this was an unexpected result, and we are in the process of further exploring the underlying mechanisms. It is possible that there are other regulatory pathways at play that are yet to be identified. However, we believe that by offering possible interpretations of our data and potential directions for future research, we contribute to the ongoing scientific discourse in this area.

      Reviewer #3 (Public Review):

      The manuscript by Yang et al. investigated in mice how hypobaric hypoxia can modify the RBC clearance function of the spleen, a concept that is of interest. Via interpretation of their data, the authors proposed a model that hypoxia causes an increase in cellular iron levels, possibly in RPMs, leading to ferroptosis, and downregulates their erythrophagocytic capacity. However, most of the data is generated on total splenocytes/total spleen, and the conclusions are not always supported by the presented data. The model of the authors could be questioned by the paper by Youssef et al. (which the authors cite, but in an unclear context) that the ferroptosis in RPMs could be mediated by augmented erythrophagocytosis. As such, the loss of RPMs in vivo which is indeed clear in the histological section shown (and is a strong and interesting finding) can be not directly caused by hypoxia, but by enhanced RBC clearance. Such a possibility should be taken into account.

      Thank you for your insightful comments and constructive feedback. In their research, Youssef et al. (2018) discerned that elevated erythrophagocytosis of stressed red blood cells (RBCs) instigates ferroptosis in red pulp macrophages (RPMs) within the spleen, as evidenced in a mouse model of transfusion. This augmentation of erythrophagocytosis was conspicuous five hours post-injection of RBCs. Conversely, our study elucidated the decrease in erythrophagocytosis in the spleen after both 7 and 14 days.

      Typically, macrophages exhibit an enhanced phagocytic capacity in the immediate aftermath of stress or stimulation. Nonetheless, the temporal points of observation in our study were considerably extended (seven and fourteen days). It remains uncertain whether phagocytic capability was amplified during the acute phase of HH exposure—particularly within the first day, considering that splenoconstriction under HH for one day results in the release of stored RBCs into the bloodstream—and whether this initial response could precipitate ferroptosis and subsequently diminished erythrophagocytosis at the 7 or 14 day marks under continued HH conditions.

      Major points:

      1) The authors present data from total splenocytes and then relate the obtained data to RPMs, which are quantitatively a minor population in the spleen. Eg, labile iron is increased in the splenocytes upon HH, but the manuscript does not show that this occurs in the red pulp or RPMs. They also measure gene/protein expression changes in the total spleen and connect them to changes in macrophages, as indicated in the model Figure (Fig. 7). HO-1 and levels of Ferritin (L and H) can be attributed to the drop in RPMs in the spleen. Are any of these changes preserved cell-intrinsically in cultured macrophages? This should be shown to support the model (relates also to lines 487-88, where the authors again speculate that hypoxia decreases HO-1 which was not demonstrated). In the current stage, for example, we do not know if the labile iron increase in cultured cells and in the spleen in vivo upon hypoxia is the same phenomenon, and why labile iron is increased. To improve the manuscript, the authors should study specifically RPMs.

      We express our gratitude for your perceptive remarks. In our initial manuscript, we did not evaluate labile iron within the red pulp and red pulp macrophages (RPMs). To address this oversight, we utilized the Lillie staining method, in accordance with the protocol outlined by Liu et al., (Chemosphere, 2021, 264(Pt 1):128413), to discern Fe2+ presence within these regions. The outcomes were consistent with our antecedent Western blot and flow cytometry findings in the spleen, corroborating an increment in labile iron specifically within the red pulp of the spleen.

      However, we acknowledge the necessity for other supplementary experimental efforts to further validate these findings. Additionally, we scrutinized the expression of heme oxygenase-1 (HO-1) and iron-related proteins, including transferrin receptor (TfR), ferroportin (Fpn), ferritin (Ft), and nuclear receptor coactivator 4 (NCOA4) in primary macrophages subjected to 1% hypoxic conditions, both with and without hemoglobin treatment. Our results indicated that the expression of ferroptosis-related proteins was consistent with in vivo studies, however the expression of iron related proteins was not similar in vitro and in vivo. It suggesting that the increase in labile iron in cultured cells and the spleen in vivo upon hypoxia are not identical phenomena. However, the precise mechanism remains elusive.

      In our study, we observed a decrease in HO-1 protein expression following 7 and 14 days of HH exposure, as shown in Figure 3U, 5A, and S1A. This finding contradicts previous research that identified HO-1 as a hypoxia-inducible factor (HIF) target under hypoxic conditions (P J Lee et al., 1997). Our discussion, therefore, addressed the potential discrepancy in HO-1 expression under HH. According to our findings, HO-1 regulation under HH appears to be predominantly influenced by macrophage numbers and the RBCs to be processed in the spleen or macrophages, rather than by hypoxia alone.

      It is challenging to discern whether the increased labile iron observed in vitro accurately reflects the in vivo phenomenon, as replicating the iron requirements for RBCs production induced by HH in vitro is inherently difficult. However, by integrating our in vivo and in vitro studies, we determined that the elevated Fe2+ levels were not dependent on HO-1 protein expression, as HO-1 levels was increased in vitro while decreasing in vivo under hypoxic/HH exposure.

      2) The paper uses flow cytometry, but how this method was applied is suboptimal: there are no gating strategies, no indication if single events were determined, and how cell viability was assessed, which are the parent populations when % of cells is shown on the graphs. How RBCs in the spleen could be analyzed without dedicated cell surface markers? A drop in splenic RPMs is presented as the key finding of the manuscript but Fig. 3M shows gating (suboptimal) for monocytes, not RPMs. RPMs are typically F4/80-high, CD11-low (again no gating strategy is shown for RPMs). Also, the authors used single-cell RNAseq to detect a drop in splenic macrophages upon HH, but they do not indicate in Fig. A-C which cluster of cells relates to macrophages. Cell clusters are not identified in these panels, hence the data is not interpretable).

      Thank you for your comments and constructive critique regarding our flow cytometry methodology and presentation. We understand the need for greater transparency and detailed explanation of our procedures, and we acknowledge that the lack of gating strategies and other pertinent information in our initial manuscript may have affected the clarity of our findings.

      In our initial report, we provided an overview of the decline in migrated macrophages (F4/80hiCD11bhi), including both M1 and M2 expression in migrated macrophages, as illustrated in Figure 3, but did not specifically address the changes in red pulp macrophages (RPMs). Based on previous results, it is difficult to identify CD11b- and CD11blo cells. We will repeat the results and attempt to identify F4/80hiCD11blo cells in the revised manuscript. The results of the reanalysis are now included (Figure 3M). However, single-cell in vivo analysis studies may more accurately identify specific cell types that decrease after exposure to HH.

      Furthermore, we substantiated the reduction in red pulp, as evidenced by Figure 4J, given that iron processing primarily occurs within the red pulp. In Figure 3, our initial objective was merely to illustrate the reduction in total macrophages in the spleen following HH exposure.

      To further clarify the characterization of various cell types, we conducted a single-cell analysis. Our findings indicated that clusters 0,1,3,4,14,18, and 29 represented B cells, clusters 2, 10, 12, and 28 represented T cells, clusters 15 and 22 corresponded to NK cells, clusters 5, 11, 13, and 19 represented NKT cells, clusters 6, 9, and 24 represented cell cycle cells, clusters 26 and 17 represented plasma cells, clusters 21 and 23 represented neutrophils, cluster 30 represented erythrocytes, and clusters 7, 8, 16, 20, 24, and 27 represented dendritic cells (DCs) and macrophages, as depicted in Figure 3E.

      3) The authors draw conclusions that are not supported by the data, some examples: a) they cannot exclude eg the compensatory involvement of the liver in the RBCs clearance (the differences between HH sham and HH splenectomy is mild in Fig. 2 E, F and G).

      Thank you for your insightful comments and for pointing out the potential involvement of other organs, such as the liver, in the RBC clearance under HH conditions. We concur with your observation that the differences between the HH sham and HH splenectomy conditions in Fig. 2 E, F, and G are modest. This could indeed suggest a compensatory role of other organs in RBC clearance when splenectomy is performed. Our intent, however, was to underscore the primary role of the spleen in this process under HH exposure.

      In fact, after our initial investigations, we conducted a more extensive study examining the role of the liver in RBC clearance under HH conditions. Our findings, as illustrated in the figures submitted with this response, indeed support a compensatory role for the liver. Specifically, we observed an increase in macrophage numbers and phagocytic activity in the liver under HH conditions. Although the differences in RBC count between the HH sham and HH splenectomy conditions may seem minor, it is essential to consider the unit of this measurement, which is value*1012/ml. Even a small numerical difference can represent a significant biological variation at this scale.

      b) splenomegaly is typically caused by increased extramedullary erythropoiesis, not RBC retention. Why do the authors support the second possibility? Related to this, why do the authors conclude that data in Fig. 4 G,H support the model of RBC retention? A significant drop in splenic RBCs (poorly gated) was observed at 7 days, between NN and HH groups, which could actually indicate increased RBC clearance capacity = less retention.

      Prior investigations have predominantly suggested that spleen enlargement under hypoxic conditions stems from the spleen's extramedullary hematopoiesis. Nevertheless, an intriguing study conducted in 1994 by the General Hospital of Xizang Military Region reported substantial exaggeration and congestion of splenic sinuses in high altitude polycythemia (HAPC) patients. This finding was based on the dissection of spleens from 12 patients with HAPC (Zou Xunda, et al., Southwest Defense Medicine, 1994;5:294-296). Moreover, a recent study indicated that extramedullary erythropoiesis reaches its zenith between 3 to 7 days (Wang H et al., 2021).

      Considering these findings, the present study postulates that hypoxia-induced inhibition of erythrophagocytosis may lead to RBC retention. However, we acknowledge that the manuscript in its current preprint form does not offer conclusive evidence to substantiate this hypothesis. To bridge this gap, we further conducted experiments where the spleen was perfused, and total cells were collected post HH exposure. These cells were then smeared onto slides and subjected to Wright staining. Our results unequivocally demonstrate an evident increase in deformation and retention of RBCs in the spleen following 7 and 14 days of HH exposure. This finding strengthens our initial hypothesis and contributes a novel perspective to the understanding of splenic responses under hypoxic conditions.

      c) lines 452-54: there is no data for decreased phagocytosis in vivo, especially in the context of erythrophagocytosis. This should be done with stressed RBCs transfusion assays, very good examples, like from Youssef et al. or Threul et al. are available in the literature.

      Thanks. In their seminal work, Youssef and colleagues demonstrated that the transfusion of stressed RBCs triggers erythrophagocytosis and subsequently incites ferroptosis in red pulp macrophages (RPMs) within a span of five hours. Given these observations, the applicability of this model to evaluate macrophage phagocytosis in the spleen or RPMs under HH conditions may be limited, as HH has already induced erythropoiesis in vivo. In addition, it was unclear whether the membrane characteristics of stress induced RBCs were similar to those of HH induced RBCs, as this is an important signal for in vivo phagocytosis. The ambiguity arises from the fact that we currently lack sufficient knowledge to discern whether the changes in phagocytosis are instigated by the presence of stressed RBCs or by changes of macrophages induced by HH in vivo. Nonetheless, we appreciate the potential value of this approach and intend to explore its utility in our future investigations. The prospect of distinguishing the effects of stressed RBCs from those of HH on macrophage phagocytosis is an intriguing line of inquiry that could yield significant insights into the mechanisms governing these physiological processes. We will investigate this issue in our further study.

      d) Line 475 - ferritinophagy was not shown in response to hypoxia by the manuscript, especially that NCOA4 is decreased, at least in the total spleen.

      Drawing on the research published in eLife in 2015, it was unequivocally established that ferritinophagy, facilitated by Nuclear Receptor Coactivator 4 (NCOA4), is indispensable for erythropoiesis. This process is modulated by iron-dependent HECT and RLD domain containing E3 ubiquitin protein ligase 2 (HERC2)-mediated proteolysis (Joseph D Mancias et al., eLife. 2015; 4: e10308). As is widely recognized, NCOA4 plays a critical role in directing ferritin (Ft) to the lysosome, where both NCOA4 and Ft undergo coordinated degradation.

      In our study, we provide evidence that exposure to HH stimulates erythropoiesis (Figure 1). We propose that this, in turn, could promote ferritinophagy via NCOA4, resulting in a decrease in NCOA4 protein levels post-HH exposure. We will further increase experiments to verify this concern. This finding not only aligns with the established understanding of ferritinophagy and erythropoiesis but also adds a novel dimension to the understanding of cellular responses to hypoxic conditions.

      4) In a few cases, the authors show only representative dot plots or histograms, without quantification for n>1. In Fig. 4B the authors write about a significant decrease (although with n=1 no statistics could be applied here; of note, it is not clear what kind of samples were analyzed here). Another example is Fig. 6I. In this case, it is even more important as the data are conflicting the cited article and the new one: PMCID: PMC9908853 which shows that hypoxia stimulates efferocytosis. Sometimes the manuscript claim that some changes are observed, although they are not visible in representative figures (eg for M1 and M2 macrophages in Fig. 3M)

      We recognize that our initial portrayal of Figure 4B was lacking in precision, given that it did not include the corresponding statistical graph. While our results demonstrated a significant reduction in the ability to phagocytose E. coli, in line with the recommendations of other reviewers, we have opted to remove the results pertaining to E. coli phagocytosis in this revision, as they primarily reflected immune function. In relation to PMC9908853, which reported metabolic adaptation facilitating enhanced macrophage efferocytosis in limited-oxygen environments, it is worth noting that the macrophages investigated in this study were derived from ER-Hoxb8 macrophage progenitors following the removal of β-estradiol. Consequently, questions arise regarding the comparability between these cultured macrophages and primary macrophages obtained fresh from the spleen post HH exposure. The characteristics and functions of these two different macrophage sources may not align precisely, and this distinction necessitates further investigation.

      5) There are several unclear issues in methodology:

      • what is the purity of primary RPMs in the culture? RPMs are quantitatively poorly represented in splenocyte single-cell suspensions. This reviewer is quite skeptical that the processing of splenocytes from approx 1 mm3 of tissue was sufficient to establish primary RPM cultures. The authors should prove that the cultured cells were indeed RPMs, not monocyte-derived macrophages or other splenic macrophage subtypes.

      Thank you for your thoughtful comments and inquiries. Firstly, I apologize if we did not make it clear in the original manuscript. The purity of the primary RPMs in our culture was found to be approximately 40%, as identified by F4/80hiCD11blo markers using flow cytometry. We recognize that RPMs are typically underrepresented in splenocyte single-cell suspensions, and the concern you raise about the potential for contamination by other cell types is valid.

      We apologize for any ambiguities in the methodological description that may have led to misunderstandings during the review. Indeed, the entirety of the spleen is typically employed for splenic macrophage culture. The size of the spleen can vary dependent on the species and age of the animal, but in mice, it is commonly approximately 1 cm in length. The spleen is then dissected into minuscule fragments, each approximately 1 mm3 in volume, to aid in enzymatic digestion. This procedure does not merely utilize a single 1 mm3 tissue fragment for RPMs cultures. Although the isolation and culture of spleen macrophages can present considerable challenges, our method has been optimized to enhance the yield of this specific cell population.

      • (around line 183) In the description of flow cytometry, there are several missing issues. In 1) it is unclear which type of samples were analyzed. In 2) it is not clear how splenocyte cell suspension was prepared.

      1) Whole blood was extracted from the mice and collected into an anticoagulant tube, which was then set aside for subsequent thiazole orange (TO) staining. 2) Splenic tissue was procured from the mice and subsequently processed into a single-cell suspension using a 40 μm filter. The erythrocytes within the entire sample were subsequently lysed and eliminated, and the remaining cell suspension was resuspended in phosphate-buffered saline (PBS) in preparation for ensuing analyses.

      We have meticulously revised these methodological details in the corresponding section of the manuscript to ensure clarity and precision.

      • In line 192: what does it mean: 'This step can be omitted from cell samples'?

      The methodology employed for the quantification of intracellular divalent iron content and lipid peroxidation level was executed as follows: Splenic tissue was first processed into a single cell suspension, subsequently followed by the lysis of RBCs. It should be noted that this particular stage is superfluous when dealing with isolated cell samples. Subsequently, a total of 1 × 106 cells were incubated with 100 μL of BioTracker Far-red Labile Fe2+ Dye (1 mM, Sigma, SCT037, USA) for a duration of 1 hour, or alternatively, C11-Bodipy 581/591 (10 μM, Thermo Fisher, D3861, USA) for a span of 30 minutes. Post incubation, cells were thoroughly washed twice with PBS. Flow cytometric analysis was subsequently performed, utilizing the FL6 (638 nm/660 nm) channel for the determination of intracellular divalent iron content, and the FL1 (488 nm/525 nm) channel for the quantification of the lipid peroxidation level.

      • 'TO method' is not commonly used anymore and hence it was unclear to this Reviewer. Reticulocytes should be analyzed with proper gating, using cell surface markers.

      We are appreciative of your astute observation pertaining to the methodology we employed to analyze reticulocytes in our study. We value your recommendation to utilize cell surface markers for effective gating, which indeed represents a more modern and accurate approach. However, as reticulocyte identification is not the central focus of our investigation, we opted for the TO staining method—due to its simplicity and credibility of results. In our initial exploration, we adopted the TO staining method in accordance with the protocol outlined (Sci Rep, 2018, 8(1):12793), primarily owing to its established use and demonstrated efficacy in reticulocyte identification.

      • The description of 'phagocytosis of E. coli and RBCs' in the Methods section is unclear and incomplete. The Results section suggests that for the biotinylated RBCs, phagocytosis? or retention? Of RBCs was quantified in vivo, upon transfusion. However, the Methods section suggests either in vitro/ex vivo approach. It is vague what was indeed performed and how in detail. If RBC transfusion was done, this should be properly described. Of note, biotinylation of RBCs is typically done in vivo only, being a first step in RBC lifespan assay. The such assay is missing in the manuscript. Also, it is not clear if the detection of biotinylated RBCs was performed in permeablized cells (this would be required).

      Thanks for the comments. In our initial methodology, we employed Cy5.5-labeled Escherichia coli to probe phagocytic function, albeit with the understanding that this may not constitute the most ideal model for phagocytosis detection within this context (in light of recommendations from other reviewers, we have removed the E. coli phagocytosis results from this revision, as they predominantly mirror immune function). Our fundamental aim was to ascertain whether HH compromises the erythrophagocytic potential of splenic macrophages. In pursuit of this, we subsequently analyzed the clearance of biotinylated RBCs in both the bloodstream and spleen to assess phagocytic functionality in vivo.

      In the present study, instead of transfusing biotinylated RBCs into mice, we opted to inject N-Hydroxysuccinimide (NHS)-biotin into the bloodstream. NHS-biotin is capable of binding with cell membranes in vivo and can be recognized by streptavidin-fluorescein isothiocyanate (FITC) after cells are extracted from the blood or spleen in vitro. Consequently, biotin-labeled RBCs were detectable in both the blood and spleen following NHS-biotin injection for a duration of 21 days.

      Ultimately, we employed flow cytometry to analyze the NHS-biotin labeled RBCs in the blood or spleen. This method facilitates the detection of live cells and is not applicable to permeabilized cells. We believe this approach better aligns with our investigative goals and offers a more robust evaluation of erythrophagocytic function under hypoxic conditions.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Jocher, Janssen, et al examine the robustness of comparative functional genomics studies in primates that make use of induced pluripotent stem cell-derived cells. Comparative studies in primates, especially amongst the great apes, are generally hindered by the very limited availability of samples, and iPSCs, which can be maintained in the laboratory indefinitely and defined into other cell types, have emerged as promising model systems because they allow the generation of data from tissues and cells that would otherwise be unobservable.

      Undirected differentiation of iPSCs into many cell types at once, using a method known as embryoid body differentiation, requires researchers to manually assign all cell types in the dataset so they can be correctly analysed. Typically, this is done using marker genes associated with a specific cell type. These are defined a priori, and have historically tended to be characterised in mice and humans and then employed to annotate other species. Jocher, Janssen, et al ask if the marker genes and features used to define a given cell type in one species are suitable for use in a second species, and then quantify the degree of usefulness of these markers. They find that genes that are informative and cell type specific in a given species are less valuable for cell type identification in other species, and that this value, or transferability, drops off as the evolutionary distance between species increases.

      This paper will help guide future comparative studies of gene expression in primates (and more broadly) as well as add to the growing literature on the broader challenges of selecting powerful and reliable marker genes for use in single-cell transcriptomics.

      Strengths:

      Marker gene selection and cell type annotation is a challenging problem in scRNA studies, and successful classification of cells often requires manual expert input. This can be hard to reproduce across studies, as, despite general agreement on the identity of many cell types, different methods for identifying marker genes will return different sets of genes. The rise of comparative functional genomics complicates this even further, as a robust marker gene in one species need not always be as useful in a different taxon. The finding that so many marker genes have poor transferability is striking, and by interrogating the assumption of transferability in a thorough and systematic fashion, this paper reminds us of the importance of systematically validating analytical choices. The focus on identifying how transferability varies across different types of marker genes (especially when comparing TFs to lncRNAs), and on exploring different methods to identify marker genes, also suggests additional criteria by which future researchers could select robust marker genes in their own data.

      The paper is built on a substantial amount of clearly reported and thoroughly considered data, including EBs and cells from four different primate species - humans, orangutans, and two macaque species. The authors go to great lengths to ensure the EBs are as comparable as possible across species, and take similar care with their computational analyses, always erring on the side of drawing conservative conclusions that are robustly supported by their data over more tenuously supported ones that could be impacted by data processing artefacts such as differences in mappability, etc. For example, I like the approach of using liftoff to robustly identify genes in non-human species that can be mapped to and compared across species confidently, rather than relying on the likely incomplete annotation of the non-human primate genomes. The authors also provide an interactive data visualisation website that allows users to explore the dataset in depth, examine expression patterns of their own favourite marker genes and perform the same kinds of analyses on their own data if desired, facilitating consistency between comparative primate studies.

      We thank the Reviewer for their kind assessment of our work.

      Weaknesses and recommendations:

      (1) Embryoid body generation is known to be highly variable from one replicate to the next for both technical and biological reasons, and the authors do their best to account for this, both by their testing of different ways of generating EBs, and by including multiple technical replicates/clones per species. However, there is still some variability that could be worth exploring in more depth. For example, the orangutan seems to have differentiated preferentially towards cardiac mesoderm whereas the other species seemed to prefer ectoderm fates, as shown in Figure 2C. Likewise, Supplementary Figure 2C suggests a significant unbalance in the contributions across replicates within a species, which is not surprising given the nature of EBs, while Supplementary Figure 6 suggests that despite including three different clones from a single rhesus macaque, most of the data came from a single clone. The manuscript would be strengthened by a more thorough exploration of the intra-species patterns of variability, especially for the taxa with multiple biological replicates, and how they impact the number of cell types detected across taxa, etc.

      You are absolutely correct in pointing out that the large clonal variability in cell type composition is a challenge for our analysis. We also noted the odd behavior of the orangutan EBs, and their underrepresentation of ectoderm. There are many possible sources for these variable differentiation propensities: clone, sample origin (in this case urine) and individual. However, unfortunately for the orangutan, we have only one individual and one sample origin and thus cannot say whether this germ layer preference says something about the species or is due to our specific sample.

      Because of this high variability from multiple sources, getting enough cell types with an appreciable overlap between species was limiting to analyses. In order to be able to derive meaningful conclusions from intra-species analyses and the impact of different sources of variation on cell type propensity, we would need to sequence many more EBs with an experimental design that balances possible sources of variation. This would go beyond the scope of this study.

      Instead, here we control for intra-species variation in our analyses as much as possible: For the analysis of cell type specificity and conservation the comparison is relative for the different specificity degrees (Figure 3C).  For the analysis of marker gene conservation, we explicitly take intra-species variation into account (Figure 4D).

      The same holds for the temporal aspect of the data, which is not really discussed in depth despite being a strength of the design. Instead, days 8 and 16 are analysed jointly, without much attention being paid to the possible differences between them.

      Concerning the temporal aspect, indeed we knowingly omitted to include an explicit comparison of day 8 and day 16 EBs, because we felt that it was not directly relevant to our main message. Our pseudotime analysis showed that the differences of the two time points were indeed a matter of degree and not so much of quality. All major lineages were already present at day 8 and even though day 8 cells had on average earlier pseudotimes, there was a large overlap in the pseudotime distributions between the two sampling time points (Author response image 1). That is why we decided to analyse the data together.

      Are EBs at day 16 more variable between species than at day 8? Is day 8 too soon to do these kinds of analyses?

      When we started the experiment, we simply did not know what to expect. We were worried that cell types at day 8 might be too transient, but longer culture can also introduce biases. That is why we wanted to look at two time points, however as mentioned above the differences are in degree.

      Concerning the cell type composition: yes, day 16 EBs are more heterogeneous than day 8 EBs. Firstly, older EBs have more distinguishable cell types and hence even if all EBs had identical composition, the sampling variance would be higher given that we sampled a similar number of cells from both time points. Secondly, in order to grow EBs for a longer time, we moved them from floating to attached culture on day 8 and it is unclear how much variance is added by this extra handling step.

      Are markers for earlier developmental progenitors better/more transferable than those for more derived cell types?

      We did not see any differences in the marker conservation between early and late cell types, but we have too little data to say whether this carries biological meaning.

      Author response image 1.

      Pseudotime analysis for a differentiation trajectory towards neurons. Single cells were first aggregated into metacells per species using SEACells (Persad et al. 2023). Pluripotent and ectoderm metacells were then integrated across all four species using Harmony and a combined pseudotime was inferred with Slingshot (Street et al. 2018), specifying iPSCs as the starting cluster. Here, lineage 3 is shown, illustrating a differentiation towards neurons. (A) PHATE embedding colored by pseudotime (Moon et al. 2019). (B) PHATE embedding colored by celltype. (C) Pseudotime distribution across the sampling timepoints (day 8 and day 16) in different species.

      (2) Closely tied to the point above, by necessity the authors collapse their data into seven fairly coarse cell types and then examine the performance of canonical marker genes (as well as those discovered de novo) across the species. However some of the clusters they use are somewhat broad, and so it is worth asking whether the lack of specificity exhibited by some marker genes and driving their conclusions is driven by inter-species heterogeneity within a given cluster.

      Author response image 2.

      UMAP visualization for the Harmony-integrated dataset across all four species for the seven shared cell types, colored by cell type identity (A) and species (B).

      Good point, if we understand correctly, the concern is that in our relatively broadly defined cell types, species are not well mixed and that this in turn is partly responsible for marker gene divergence. This problem is indeed difficult to address, because most approaches to evaluate this require integration across species which might lead to questionable results (see our Discussion).

      Nevertheless, we attempted an integration across all four species. To this end, we subset the cells for the 7 cell types that we found in all four species and visualized cell types and species in the UMAPs above (Author response image 2).

      We see that cardiac fibroblasts appear poorly integrated in the UMAP, but they still have very transferable marker genes across species. We quantified integration quality using the cell-specific mixing score (cms) (Lütge et al. 2021) and indeed found that the proportion of well integrated cells is lowest for cardiac fibroblasts (Author response image 3A). On the other end of the cms spectrum, neural crest cells appear to have the best integration across species, but their marker transferability between species is rather worse than for cardiac fibroblasts (Supplementary Figure 9). Cell-type wise calculated rank-biased overlap scores that we use for marker gene conservation show the same trends (Author response image 3B) as the F1 scores for marker gene transferability.  Hence, given our current dataset we do not see any indication that the low marker gene conservation is a result of too broadly defined cell types.

      Author response image 3.

      (A) Evaluation of species mixing per cell type in the Harmony-integrated dataset, quantified by the fraction of cells with an adjusted cell-specific mixing score (cms) above 0.05. (B) Summary of rank-biased overlap (RBO) scores per cell type to assess concordance of marker gene rankings for all species pairs.

      Reviewer #2 (Public review):

      Summary:

      The authors present an important study on identifying and comparing orthologous cell types across multiple species. This manuscript focuses on characterizing cell types in embryoid bodies (EBs) derived from induced pluripotent stem cells (iPSCs) of four primate species, humans, orangutans, cynomolgus macaques, and rhesus macaques, providing valuable insights into cross-species comparisons.

      Strengths:

      To achieve this, the authors developed a semi-automated computational pipeline that integrates classification and marker-based cluster annotation to identify orthologous cell types across primates. This study makes a significant contribution to the field by advancing cross-species cell type identification.

      We thank the reviewer for their positive and thoughtful feedback.

      Weaknesses:

      However, several critical points need to be addressed.

      (1) Use of Liftoff for GTF Annotation

      The authors used Liftoff to generate GTF files for Pongo abelii, Macaca fascicularis, and Macaca mulatta by transferring the hg38 annotation to the corresponding primate genomes. However, it is unclear why they did not use species-specific GTF files, as all these genomes have existing annotations. Why did the authors choose not to follow this approach?

      As Reviewer 1 also points out, also we have observed that the annotation of non-human primates often has truncated 3’UTRs. This is especially problematic for 3’ UMI transcriptome data as the ones in the 10x dataset that we present here. To illustrate this we compared the Liftoff annotation derived from Gencode v32,  that we also used throughout our manuscript to the Ensembl gene annotation Macaca_fascicularis_6.0.111. We used transcriptomes from human and cynomolgus iPSC bulk RNAseq  (Kliesmete et al. 2024) using the Prime-seq protocol (Janjic et al. 2022) which is very similar to 10x in that it also uses 3’ UMIs. On average using Liftoff produces higher counts than the Ensembl annotation (Author response image 4A). Moreover, when comparing across species, using Ensembl for the macaque leads to an asymmetry in differentially expressed genes, with apparently many more up-regulated genes in humans. In contrast, when we use the Liftoff annotation, we detect fewer DE-genes and a similar number of genes is up-regulated in macaques as in humans (Author response image 4B). We think that the many more DE-genes are artifacts due to mismatched annotation in human and cynomolgus macaques. We illustrate this for the case of the transcription factor SALL4 in Author response image 4 C,D.  The Ensembl annotation reports 2 transcripts, while Liftoff from Gencode v32 suggests 5 transcripts, one of which has a longer 3’UTR. This longer transcript is also supported by Nanopore data from macaque iPSCs. The truncation of the 3’UTR in this case leads to underestimation of the expression of SALL4 in macaques and hence SALL4 is detected as up-regulated in humans (DESeq2: LFC= 1.34, p-adj<2e-9). In contrast, when using the Liftoff annotation SALL4 does not appear to be DE between humans and macaques (LFC=0.33, p.adj=0.20).

      Author response image 4. 

      (A) UMI-counts/ gene for the same cynomolgus macaque iPSC samples. On the x-axis the gtf file from Ensembl Macaca_fascicularis_6.0.111 was used to count and on the y-axis we used our filtered Liftoff annotation that transferred the human gene models from Gencode v32. (B) The # of DE-genes between human  and cynomolgus iPSCs detected with DESeq2. In Liftoff, we counted human samples using Gencode v32 and compared it to the Liftoff annotation of the same human gene models to macFas6. In Ensembl, we use Gencode v32 for the human and  Ensembl Macaca_fascicularis_6.0.111 for the Macaque. For both comparisons we subset the genes to only contain one to one orthologues as annotated in biomart. Up and down regulation is relative to human expression. C) Read counts for one example gene SALL4. Here we used in addition to the Liftoff and Ensembl annotation also transcripts derived from Nanopore cDNA sequencing of cynomolgus iPSCs. D) Gene models for SALL4 in the space of MacFas6 and a coverage for iPSC-Prime-seq bulk RNA-sequencing.

      (2) Transcript Filtering and Potential Biases

      The authors excluded transcripts with partial mapping (<50%), low sequence identity (<50%), or excessive length differences (>100 bp and >2× length ratio). Such filtering may introduce biases in read alignment. Did the authors evaluate the impact of these filtering choices on alignment rates?

      We excluded those transcripts from analysis in both species, because they present a convolution of sequence-annotation differences and expression. The focus in our study is on regulatory evolution and we knowingly omit marker differences that are due to a marker being mutated away, we will make this clearer in the text of a revised version.

      (3) Data Integration with Harmony

      The methods section does not specify the parameters used for data integration with Harmony. Including these details would clarify how cross-species integration was performed.

      We want to stress  that none of our conservation and marker gene analyses relies on cross-species integration. We only used the Harmony integrated data for visualisation in Figure 1 and the rough germ-layer check up in Supplementary Figure S3.  We will add a better description in the revised version.

      References

      Janjic, Aleksandar, Lucas E. Wange, Johannes W. Bagnoli, Johanna Geuder, Phong Nguyen, Daniel Richter, Beate Vieth, et al. 2022. “Prime-Seq, Efficient and Powerful Bulk RNA Sequencing.” Genome Biology 23 (1): 88.

      Kliesmete, Zane, Peter Orchard, Victor Yan Kin Lee, Johanna Geuder, Simon M. Krauß, Mari Ohnuki, Jessica Jocher, Beate Vieth, Wolfgang Enard, and Ines Hellmann. 2024. “Evidence for Compensatory Evolution within Pleiotropic Regulatory Elements.” Genome Research 34 (10): 1528–39.

      Lütge, Almut, Joanna Zyprych-Walczak, Urszula Brykczynska Kunzmann, Helena L. Crowell, Daniela Calini, Dheeraj Malhotra, Charlotte Soneson, and Mark D. Robinson. 2021. “CellMixS: Quantifying and Visualizing Batch Effects in Single-Cell RNA-Seq Data.” Life Science Alliance 4 (6): e202001004.

      Moon, Kevin R., David van Dijk, Zheng Wang, Scott Gigante, Daniel B. Burkhardt, William S. Chen, Kristina Yim, et al. 2019. “Visualizing Structure and Transitions in High-Dimensional Biological Data.” Nature Biotechnology 37 (12): 1482–92.

      Persad, Sitara, Zi-Ning Choo, Christine Dien, Noor Sohail, Ignas Masilionis, Ronan Chaligné, Tal Nawy, et al. 2023. “SEACells Infers Transcriptional and Epigenomic Cellular States from Single-Cell Genomics Data.” Nature Biotechnology 41 (12): 1746–57.

      Street, Kelly, Davide Risso, Russell B. Fletcher, Diya Das, John Ngai, Nir Yosef, Elizabeth Purdom, and Sandrine Dudoit. 2018. “Slingshot: Cell Lineage and Pseudotime Inference for Single-Cell Transcriptomics.” BMC Genomics 19 (1): 477.

    1. Author Response

      We would like to thank the senior editor, reviewing editor and all the reviewers for taking out precious time to review our manuscript and appreciating our study. We are excited that all of you have found strength in our work and have provided comments to strengthen it further. We sincerely appreciate the valuable comments and suggestions, which we believe will help us to further improve the quality of our work.

      Reviewer 1

      The manuscript by Dubey et al. examines the function of the acetyltransferase Tip60. The authors show that (auto)acetylation of a lysine residue in Tip60 is important for its nuclear localization and liquid-liquid-phase-separation (LLPS). The main observations are: (i) Tip60 is localized to the nucleus, where it typically forms punctate foci. (ii) An intrinsically disordered region (IDR) within Tip60 is critical for the normal distribution of Tip60. (iii) Within the IDR the authors show that a lysine residue (K187), that is auto-acetylated, is critical. Mutation of that lysine residue to a non-acetylable arginine abolishes the behavior. (iv) biochemical experiments show that the formation of the punctate foci may be consistent with LLPS.

      On balance, this is an interesting study that describes the role of acetylation of Tip60 in controlling its biochemical behavior as well as its localization and function in cells. The authors mention in their Discussion section other examples showing that acetylation can change the behavior of proteins with respect to LLPS; depending on the specific context, acetylation can promote (as here for Tip60) or impair LLPS.

      Strengths:

      The experiments are largely convincing and appear to be well executed.

      Weaknesses:

      The main concern I have is that all in vivo (i.e. in cells) experiments are done with overexpression in Cos-1 cells, in the presence of the endogenous protein. No attempt is made to use e.g. cells that would be KO for Tip60 in order to have a cleaner system or to look at the endogenous protein. It would be reassuring to know that what the authors observe with highly overexpressed proteins also takes place with endogenous proteins.

      Response: The main reason to perform these experiments with overexpression system was to generate different point mutants and deletion mutants of TIP60 and analyse their effect on its properties and functions. To validate our observations with overexpression system, we also examined localization pattern of endogenous TIP60 by IFA and results depict similar kind of foci pattern within the nucleus as observed with overexpressed TIP60 protein (Figure 4A). However, we understand the reviewers concern and agree to repeat some of the overexpression experiments under endogenous TIP60 knockdown conditions using siRNA or shRNA against 3’ UTR region.

      Also, it is not clear how often the experiments have been repeated and additional quantifications (e.g. of western blots) would be useful.

      Response: The experiments were performed as independent biological replicates (n=3) and this is mentioned in the figure legends. Regarding the suggestion for quantifying Western blots, we want to bring into the notice that where ever required (for blots such as Figure 2F, 6H) that require quantitative estimation, graph representing quantitated value with p-value had already been added. However as suggested, in addition, quantitation for Figure 6D will be performed and added in the revised version.

      In addition, regarding the LLPS description (Figure 1), it would be important to show the wetting behaviour and the temperature-dependent reversibility of the droplet formation.

      Response: We appreciate the suggestion, and we will perform these assays and include the results in the revised version.

      In Fig 3C the mutant (K187R) Tip60 is cytoplasmic, but still appears to form foci. Is this still reflecting phase separation, or some form of aggregation?

      Response: TIP60 (K187R) mutant remains cytosolic with homogenous distribution as shown in Figure 2E. Also with TIP60 partners like PXR or p53, this mutant protein remains homogenously distributed in the cytosol. However, when co-expressed with TIP60 (Wild-type) protein, this mutant protein although still remain cytosolic some foci-like pattern is also observed at the nuclear periphery which we believe could be accumulated aggregates.

      Reviewer 2

      The manuscript "Autoacetylation-mediated phase separation of TIP60 is critical for its functions" by Dubey S. et al reported that the acetyltransferase TIP60 undergoes phase separation in vitro and cell nuclei. The intrinsically disordered region (IDR) of TIP60, particularly K187 within the IDR, is critical for phase separation and nuclear import. The authors showed that K187 is autoacetylated, which is important for TIP60 nuclear localization and activity on histone H4. The authors did several experiments to examine the function of K187R mutants including chromatin binding, oligomerization, phase separation, and nuclear foci formation. However, the physiological relevance of these experiments is not clear since TIP60 K187R mutants do not get into nuclei. The authors also functionally tested the cancer-derived R188P mutant, which mimics K187R in nuclear localization, disruption of wound healing, and DNA damage repair. However, similar to K187R, the R188P mutant is also deficient in nuclear import, and therefore, its defects cannot be directly attributed to the disruption of the phase separation property of TIP60. The main deficiency of the manuscript is the lack of support for the conclusion that "autoacetylation-mediated phase separation of TIP60 is critical for its functions".

      This study offers some intriguing observations. However, the evidence supporting the primary conclusion, specifically regarding the necessity of the intrinsically disordered region (IDR) and K187ac of TIP60 for its phase separation and function in cells, lacks sufficient support and warrants more scrutiny. Additionally, certain aspects of the experimental design are perplexing and lack controls to exclude alternative interpretations. The manuscript can benefit from additional editing and proofreading to improve clarity.

      Response: We understand the point raised by the reviewer, however we would like to draw his attention to the data where we clearly demonstrated that acetylation of lysine 187 within the IDR of TIP60 is required for its phase separation (Figure 2J). We would like to draw reviewer’s attention to other TIP60 mutants within IDR (R177H, R188H, K189R) which all enters the nucleus and make phase separated foci. Cancer-associated mutation at R188 behaves similarly because it also hampers TIP60 acetylation at the adjacent K187 residue. Our in vitro and in cellulo results clearly demonstrate that autoacetylation of TIP60 at K187 within its IDR is critical for multiple functions including its translocation inside the nucleus, its protein-protein interaction and oligomerization which are prerequisite for phase separation of TIP60.

      There are two putative NLS sequences (NLS #1 from aa145; NLS #2 from aa184) in TIP60, both of which are within the IDR. Deletion of the whole IDR is therefore expected to abolish the nuclear localization of TIP60. Since K187 is within NLS #2, the cytoplasmic localization of the IDR and K187R mutants may not be related to the ability of TIP60 to phase separation.

      Response: We are not disputing the presence of putative NLS within IDR region of TIP60, however our results through different mutations within IDR region (K76, K80, K148, K150, R177, R178, R188, K189) clearly demonstrate that only K187 residue acetylation is critical to shuttle TIP60 inside the nucleus while all other lysine mutants located within these putative NLS region exhibited no impact on TIP60’s nuclear shuttling. We have mentioned this in our discussion, that autoacetylation of TIP60’s K187 may induce local structural modifications in its IDR which is critical for translocating TIP60 inside the nucleus where it undergoes phase separation critical for its functions. A previous example of similar kind shows, acetylation of lysine within the NLS region of TyrRS by PCAF promote its nuclear localization (Cao X et al 2017, PNAS). IDR region (which also contains K187 site) is important for phase separation once the protein enters inside the nucleus. This could be the cell’s mechanism to prevent unwarranted action of TIP60 until it enters the nucleus and phase separate on chromatin at appropriate locations.

      The chromatin-binding activity of TIP60 depends on HAT activity, but not phase-separation (Fig 1I), (Fig 2B). How do the authors reconcile the fact that the K187R mutant is able to bind to chromatin with lower activity than the HAT mutant (Fig 2F, 2I)?

      Response: K187 acetylation is required for TIP60’s nuclear translocation but not critical for chromatin binding. When soluble fraction is prepared in fractionation experiment, nuclear membrane is disrupted and TIP60 (K187R) mutant has no longer hindrance in accessing the chromatin and thus can load on the chromatin (although not as efficient as Wild-type protein). For efficient chromatin binding auto-acetylation of other lysine residues in TIP60 is required which might be hampered due to reduced catalytic activity or not sufficient enough to maintain equilibrium with HDAC’s activity inside the nucleus. In case of K187R, the reduced auto-acetylation is captured when protein is the cytosol. During fractionation, once this mutant has access to chromatin, it might auto-acetylate other lysine residues critical for chromatin loading (remember catalytic domain is intact in this mutant). This is evident due to hyper auto-acetylation of Wild-type protein compared to K187R or HAT mutant proteins. We want to bring into notice that phase-separation occurs only after efficient chromatin loading of TIP60 that is the reason that under in-cellulo conditions, both K187R (which cannot enter the nucleus) and HAT mutant (which enters the nucleus but fails to efficiently binds onto the chromatin) fails to form phase separated nuclear punctate foci.

      The DIC images of phase separation in Fig 2I need to be improved. The image for K187R showed the irregular shape of the condensates, which suggests particles in solution or on the slide. The authors may need to use fluorescent-tagged TIP60 in the in vitro LLPS experiments.

      Response: We believe this comment is for figure 2J. The irregularly shaped condensates observed for TIP60 K187R are unique to the mutant protein and are not caused by particles on the slide. We would like to draw reviewer’s attention to supplementary figure S2A, where DIC images for TIP60 (Wild-type) protein tested under different protein and PEG8000 conditions are completely clear where protein did not made phase separated droplets ruling out the probability of particles in solution or slides.

      The authors mentioned that the HAT mutant of TIP60 does not phase separate, which needs to be included.

      Response: We have already added the image of RFP-TIP60 (HAT mutant) in supplementary Fig S4A (panel 2) in the manuscript.

      Related to Point 3, the HAT mutant that doesn't form punctate foci by itself, can incorporate into WT TIP60 (Fig 5A). In vitro LLPS assay for WT, HAT, and K187R mutants with or without acetylation should be included. WT and mutant TIP can be labelled with GFP and RFP, respectively.

      Response: We would like to draw reviewer’s attention towards our co-expression experiments performed in Figure 5 where Wild-type protein (both tagged and untagged condition) is able to phase separate and make punctate foci with co-expressed HAT mutant protein (with depleted autoacetylation capacity). We believe these in cellulo experiments are already able to answer the queries what reviewer is suggesting to acheive by in vitro experiments.

      Fig 3A and 3B showed that neither K187 mutant nor HAT mutant could oligomerize. If both experiments were conducted in the absence of in vitro acetylation, how do the authors reconcile these results?

      Response: We thank the reviewer for highlighting our oversight in omitting the mention of acetyl coenzyme A here. To induce acetylation under in vitro conditions, we have added 10 µM acetyl CoA into the reactions depicted in Figure 3A and 3B. The information for acetyl CoA for Figure 3B was already included in the GST-pull down assay (material and methods section). We will add the same in the oligomerization assay of material and methods in the revised manuscript.

      In Fig 4, the colocalization images showed little overlap between TIP60 and nuclear speckle (NS) marker SC35, indicating that the majority of TIP60 localized in the nuclear structure other than NS. Have the authors tried to perturbate the NS by depleting the NS scaffold protein and examining TIP60 foci formation? Do PXR and TP53 localize to NS?

      Response: Under normal conditions majority of TIP60 is not localized in nuclear speckles (NS) so we believe that perturbing NS will not have significant effect on TIP60 foci formation. Interestingly, recently a study by Shelly Burger group (Alexander KA et al Mol Cell. 2021 15;81(8):1666-1681) had shown that p53 localizes to NS to regulate subset of its targeted genes. We have mentioned about it in our discussion section. No information is available about localization of PXR in NS.

      Were TIP60 substrates, H4 (or NCP), PXR, TP53, present inTIP60 condensates in vitro? It's interesting to see both PXR and TP53 had homogenous nuclear signals when expressed together with K187R, R188P (Fig 6E, 6G), or HAT (Suppl Fig S4A) mutants. Are PXR or TP53 nuclear foci dependent on their acetylation by TIP60? This can and should be tested.

      Response: Both p53 and PXR are known to be acetylated by TIP60. In case of PXR, TIP60 acetylate PXR at lysine 170 and this TIP60-mediated acetylation of PXR at K170 is important for TIP60-PXR foci which now we know are formed by phase separation (Bakshi K et al Sci Rep. 2017 Jun 16;7(1):3635).

      Since R188P mutant, like K187R, does not get into the nuclei, it is not suitable to use this mutant to examine the functional relevance of phase separation for TIP60. The authors need to find another mutant in IDR that retains nuclear localization and overall HAT activity but specifically disrupts phase separation. Otherwise, the conclusion needs to be restated. All cancer-derived mutants need to be tested for LLPS in vitro.

      Response: We appreciate the reviewer’s point here, but it is important to note that the objective of these experiments is to understand the impact of K187R (critical in multiple aspects of TIP60 including phase separation) and R188P (a naturally occurring cancer-associated mutation and behaving similarly to K187R) on TIP60’s activities to determine their functional relevance. As suggested by the reviewer to test and find IDR mutant that fails to phase separate however retains nuclear localization and catalytic activity can be examined in future studies.

      For all cellular experiments, it is not mentioned whether endogenous TIP60 was removed and absent in the cell lines used in this study. It's important to clarify this point because the localization and function of mutant TIP60 are affected by WT TIP60 (Fig 5).

      Response: Endogenous TIP60 was present in in cellulo experiments, however as suggested by reviewer 1 we will perform some of the in cellulo experiments under endogenous TIP60 knockdown condition to validate our findings.

      It is troubling that H4 peptide is used for in vitro HAT assay since TIP60 has much higher activity on nucleosomes and its preferred substrates include H2A.

      Response: The purpose of using H4 peptide in the HAT assay is to determine the impact of mutations of TIP60’s catalytic activity. As H4 is one of the major histone substrate for TIP60, we believe it satisfy the objective of experiments.

      Reviewer 3

      This study presents results arguing that the mammalian acetyltransferase Tip60/KAT5 auto-acetylates itself on one specific lysine residue before the MYST domain, which in turn favors not only nuclear localization but also condensate formation on chromatin through LLPS. The authors further argue that this modification is responsible for the bulk of Tip60 autoacetylation and acetyltransferase activity towards histone H4. Finally, they suggest that it is required for association with txn factors and in vivo function in gene regulation and DNA damage response.

      These are very wide and important claims and, while some results are interesting and intriguing, there is not really close to enough work performed/data presented to support them. In addition, some results are redundant between them, lack consistency in the mutants analyzed, and show contradiction between them. The most important shortcoming of the study is the fact that every single experiment in cells was done in over-expressed conditions, from transiently transfected cells. It is well known that these conditions can lead to non-specific mass effects, cellular localization not reflecting native conditions, and disruption of native interactome. On that topic, it is quite striking that the authors completely ignore the fact that Tip60 is exclusively found as part of a stable large multi-subunit complex in vivo, with more than 15 different proteins. Thus, arguing for a single residue acetylation regulating condensate formation and most Tip60 functions while ignoring native conditions (and the fact that Tip60 cannot function outside its native complex) does not allow me to support this study.

      Response: We appreciate the reviewer’s point here, but it is important to note that the main purpose to use overexpression system in the study is to analyse the effect of different generated point/deletion mutations on TIP60. We have overexpressed proteins with different tags (GFP or RFP) or without tags (Figure 3C, Figure 5) to confirm the behaviour of protein which remains unperturbed due to presence of tags. To validate we have also examined localization of endogenous TIP60 protein which also depict similar localization behaviour as overexpressed protein. We would like to draw attention that there are several reports in literature where similar kind of overexpression system are used to determine functions of TIP60 and its mutants. Also nuclear foci pattern observed for TIP60 in our studies is also reported by several other groups.

      Sun, Y., et. al. (2005) A role for the Tip60 histone acetyltransferase in the acetylation and activation of ATM. Proc Natl Acad Sci U S A, 102(37):13182-7.

      Kim, C.-H. et al. (2015) ‘The chromodomain-containing histone acetyltransferase TIP60 acts as a code reader, recognizing the epigenetic codes for initiating transcription’, Bioscience, Biotechnology, and Biochemistry, 79(4), pp. 532–538.

      Wee, C. L. et al. (2014) ‘Nuclear Arc Interacts with the Histone Acetyltransferase Tip60 to Modify H4K12 Acetylation(1,2,3).’, eNeuro, 1(1). doi: 10.1523/ENEURO.0019-14.2014.

      However, as a caution and suggested by other reviewers also we will perform some of these overexpression experiments in absence of endogenous TIP60 by using 3’ UTR specific siRNA/shRNA.

      We thank the reviewer for his comment on muti-subunit complex proteins and we would like to expand our study by determining the interaction of some of the complex subunits with TIP60 ((Wild-type) that forms nuclear condensates), TIP60 ((HAT mutant) that enters the nucleus but do not form condensates) and TIP60 ((K187R) that do not enter the nucleus and do not form condensates). We will include the result of these experiments in the revised manuscript.

      • It is known that over-expression after transient transfection can lead to non-specific acetylation of lysines on the proteins, likely in part to protect from proteasome-mediated degradation. It is not clear whether the Kac sites targeted in the experiments are based on published/public data. In that sense, it is surprising that the K327R mutant does not behave like a HAT-dead mutant (which is what exactly?) or the K187R mutant as this site needs to be auto-acetylated to free the catalytic pocket, so essential for acetyltransferase activity like in all MYST-family HATs. In addition, the effect of K187R on the total acetyl-lysine signal of Tip60 is very surprising as this site does not seem to be a dominant one in public databases.

      Response: We have chosen autoacetylation sites based on previously published studies where LC-MS/MS and in vitro acetylation assays were used to identified autoacetylation sites in TIP60 which includes K187. We have already mentioned about it in the manuscript and have quoted the references (1. Yang, C., et al (2012). Function of the active site lysine autoacetylation in Tip60 catalysis. PloS one 7, e32886. 10.1371/journal.pone.0032886. 2. Yi, J., et al (2014). Regulation of histone acetyltransferase TIP60 function by histone deacetylase 3. The Journal of biological chemistry 289, 33878–33886. 10.1074/jbc.M114.575266.). We would like to emphasize that both these studies have identified K187 as autoacetylation site in TIP60. Since TIP60 HAT mutant (with significantly reduced catalytic activity) can also enter nucleus, it is not surprising that K327 could also enter the nucleus.

      • As the physiological relevance of the results is not clear, the mutants need to be analyzed at the native level of expression to study real functional effects on transcription and localization (ChIP/IF). It is not clear the claim that Tip60 forms nuclear foci/punctate signals at physiological levels is based on what. This is certainly debated because in part of the poor choice of antibodies available for IF analysis. In that sense, it is not clear which Ab is used in the Westerns. Endogenous Tip60 is known to be expressed in multiple isoforms from splice variants, the most dominant one being isoform 2 (PLIP) which lacks a big part (aa96-147) of the so-called IDR domain presented in the study. Does this major isoform behave the same?

      Response: TIP60 antibody used in the study is from Santa Cruz (Cat. No.- sc-166323). This antibody is widely used for TIP60 detection by several methods and has been cited in numerous publications. Cat. No. will be mentioned in the manuscript. Regarding isoforms, three isoforms are known for TIP60 among which isoform 2 is majorly expressed and used in our study. Isoform and 1 and 2 have same length of IDR (150 amino acids) while isoform 3 has IDR of 97 amino acids. Interestingly, the K187 is present in all the isoforms (already mentioned in the manuscript) and missing region (96-147 amino acid) in isoform 3 has less propensity for disordered region (marked in blue circle). This clearly shows that all the isoforms of TIP60 has the tendency to phase separate.

      Author response image 1.

      • It is extremely strange to show that the K187R mutant fails to get in the nuclei by cell imaging but remains chromatin-bound by fractionation... If K187 is auto-acetylated and required to enter the nucleus, why would a HAT-dead mutant not behave the same?

      Response: We would like to draw attention that both HAT mutant and K187R mutant are not completely catalytically dead. As our data shows both these mutants have catalytic activity although at significantly decreased levels. We believe that K187 acetylation is critical for TIP60 to enter the nucleus and once TIP60 shuttles inside the nucleus autoacetylation of other sites is required for efficient chromatin binding of TIP60. In fractionation assay, nuclear membrane is dissolved while preparing the soluble fraction so there is no hindrance for K187R mutant in accessing the chromatin. While in the case of HAT mutant, it can acetylate the K187 site and thus is able to enter the nucleus however this residual catalytic activity is either not able to autoacetylate other residues required for its efficient chromatin binding or to counter activities of HDAC’s deacetylating the TIP60.

      • If K187 acetylation is key to Tip60 function, it would be most logical (and classical) to test a K187Q acetyl-mimic substitution. In that sense, what happens with the R188Q mutant? That all goes back to the fact that this cluster of basic residues looks quite like an NLS.

      Response: As suggested we will generate acetylation mimicking mutant for K187 site and examine it. Result will be added in the revised manuscript.

      • The effect of the mutant on the TIP60 complex itself needs to be analyzed, e.g. for associated subunits like p400, ING3, TRRAP, Brd8...

      Response: As suggested we will examine the effect of mutations on TIP60 complex

    1. Author Response:

      Reviewer #1:

      Summary:

      This research study utilizes a realistic motoneuron model to explore the potential to trace back the appropriate levels of excitation, inhibition, and neuromodulation in the firing patterns of motoneurons observed in in-vitro and in-vivo experiments in mammals. The research employs high-performance computing power to achieve its objectives. The work introduces a new framework that enhances understanding of the neural inputs to motoneuron pools, thereby opening up new avenues for hypothesis testing research.

      Strengths: The significance of the study holds relevance for all neuroscientists. Motoneurons are a unique class of neurons with known distribution of outputs for a wide range of voluntary and involuntary motor commands, and their physiological function is precisely understood. More importantly, they can be recorded in-vivo using minimally invasive methods, and they are directly impacted by many neurodegenerative diseases at the spinal cord level. The computational framework developed in this research offers the potential to reverse engineer the synaptic input distribution when assessing motor unit activity in humans, which holds particular importance. Overall, the strength of the findings focuses on providing a novel framework for studying and understanding the inputs that govern motoneuron behavior, with broad applications in neuroscience and potential implications for understanding neurodegenerative diseases. It highlights the significance of the study for various research domains, making it valuable to the scientific community.

      Weaknesses: The exact levels of inhibition, excitation, and neuromodulatory inputs to neural networks are unknown. Therefore the work is based on fine-tuned measures that are indirectly based on experimental results. However, obtaining such physiological information is challenging and currently impossible. From a computational perspective it is a challenge that in theory can be solved. Thus, although we have no ground-truth evidence, this framework can provide compelling evidence for all hypothesis testing research and potentially solve this physiological problem with the use of computers.

      We agree with the reviewer. This work was intended to determine the feasibility of reverse engineering motor unit firing patterns, using neuron models with a high degree realism. Given the results support this feasibility, our model and technique will therefore serve to construct new hypotheses as well as testing them.

      Reviewer #2:

      The study presents an extensive computational approach to identify the motor neuron input from the characteristics of single motor neuron discharge patterns during a ramp up/down contraction. This reverse engineering approach is relevant due to limitations in our ability to estimate this input experimentally. Using well-established models of single motor neurons, a (very) large number of simulations were performed that allowed identification of this relation. In this way, the results enable researchers to measure motor neuron behavior and from those results determine the underlying neural input scheme. Overall, the results are very convincing and represent an important step forward in understanding the neural strategies for controlling movement.

      Nevertheless, I would suggest that the authors consider the following recommendations to strengthen the message further. First, I believe that the relation between individual motor neuron behavioral characteristics (delta F, brace height etc.) and the motor neuron input properties can be illustrated more clearly. Although this is explained in the text, I believe that this is not optimally supported by figures. Figure 6 to some extent shows this, but figures 8 and 9 as well as Table 1 shows primarily the goodness of fit rather than the actual fit.

      We agree with the reviewer that showing the relationship between the motor neuron behavioral characteristics (delta F, brace height etc.) and the motor neuron input properties would be a great addition to the manuscript. Because the regression models have multiple dimensions (7 inputs and 3 outputs) it is difficult to show the relationship in a static image. We thought it best to show the goodness of fit even though it is more abstract and less intuitive. We added a supplemental diagram to Figure 8 to show the structure of the reverse engineered model that was fit (see Figure 8D).

      Author response image 1: Figure 8. Residual plots showing the goodness of fit of the different predicted values: (A) Inhibition, (B) Neuromodulation and (C) excitatory Weight Ratio. The summary plots are for the models showing highest 𝑅2 results in Table 1. The predicted values are calculated using the features extracted from the firing rates (see Figure 7, section Machine learning inference of motor pool characteristics and Regression using motoneuron outputs to predict input organization). Diagram (D) shows the multidimensionality of the RE models (see Model fits) which have 7 feature inputs (see Feature Extraction) predicting 3 outputs (Inhibition, Neuromodulation and Weight Ratio).

      Second, I would have expected the discussion to have addressed specifically the question of which of the two primary schemes (push-pull, balanced) is the most prevalent. This is the main research question of the study, but it is to some degree left unanswered. Now that the authors have identified the relation between the characteristics of motor neuron behaviors (which has been reported in many previous studies), why not exploit this finding by summarizing the results of previous studies (at least a few representative ones) and discuss the most likely underlying input scheme? Is there a consistent trend towards one of the schemes, or are both strategies commonly used?

      We agree with the reviewer that our discussion should have addressed which of the two primary schemes – push-pull or balanced – is the most prevalent. At first glance, the upper right of Figure 6 looks the most realistic when compared to real data. We thus would expect that the push-pull scheme to dominate for the given task. We added a brief section (Push-Pull vs Balance Motor Command) in the discussion to address the reviewer’s comments. This section is not exhaustive but frames the debate using relevant literature. We are also now preparing to deploy these techniques on real data.

      In addition, it seems striking to me that highly non-linear excitation profiles are necessary to obtain a linear CST ramp in many model configurations. Although somewhat speculative, one may expect that an approximately linear relation is desired for robust and intuitive motor control. It seems to me that humans generally have a good ability to accurately grade the magnitude of the motor output, which implies that either a non-linear relation has been learnt (complex task), or that the central nervous system can generally rely on a somewhat linear relation between the neural drive to the muscle and the output (simpler task).

      We agree with the reviewer, and we were surprised by these results. Our motoneuron pool is equipped with persistent inward currents (PICs) which are nonlinear. Therefore, for the motoneuron to produce a linear output the central nervous system would have to incorporate these nonlinearities into its commands.

      Following this reasoning, it could be interesting to report also for which input scheme, the excitation profile is most linear. I understand that this is not the primary aim of the study, but it may be an interesting way to elaborate on the finding that in many cases non-linear excitation profiles were needed to produce the linear ramp.

      This is a very interesting point. The most realistic firing patterns – with respect to human data – are found in the parameter regions in the upper right in Figure 6, which in fact produce the most nonlinear input (see push-pull pattern in Figure 4C). However, in future studies we hope to separate the total motor command illustrated here into descending and feedback commands. This may result in a more linear descending drive.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) It will be interesting to monitor the levels of another MIM insertase namely, OXA1. This will help to understand whether some of the observed changes in levels of OXPHOS subunits are related to alterations in the amounts of this insertase.

      OXA1 was not detected in the untargeted mass spectrometry analysis, most likely due to the fact that it is a polytopic membrane protein, spanning the membrane five times (1,2). Consequently, we measured OXA1 levels with immunoblotting, comparing patient fibroblast cells to the HC. No significant change in OXA1 steady state levels was observed. 

      See the results below. These results will be added and discussed in the revised manuscript.

      Author response image 1.

      (2) Figure 3: How do the authors explain that although TIMM17 and TIMM23 were found to be significantly reduced by Western analysis they were not detected as such by the Mass Spec. method?

      The untargeted mass spectrometry in the current study failed to detect the presence of TIMM17 for both, patient fibroblasts and mice neurons, while TIMM23 was detected only for mice neurons and a decrease was observed for this protein but was not significant. This is most likely due to the fact that TIMM17 and TIMM23 are both polytopic membrane proteins, spanning the membrane four times, which makes it difficult to extract them in quantities suitable for MS detection (2,3).

      (3) How do the authors explain the higher levels of some proteins in the TIMM50 mutated cells?

      The levels of fully functional TIM23 complex are deceased in patients' fibroblasts. Therefore, the mechanism by which the steady state level of some TIM23 substrate proteins is increased, can only be explained relying on events that occur outside the mitochondria. This could include increase in transcription, translation or post translation modifications, all of which may increase their steady state level albite the decrease in the steady state level of the import complex.

      (4) Can the authors elaborate on why mutated cells are impaired in their ability to switch their energetic emphasis to glycolysis when needed?

      Cellular regulation of the metabolic switch to glycolysis occurs via two known pathways: 1) Activation of AMP-activated protein kinase (AMPK) by increased levels of AMP/ADP (4). 2) Inhibition of pyruvate dehydrogenase (PDH) complexes by pyruvate dehydrogenase kinases (PDK) (5). Therefore, changes in the steady state levels of any of these regulators could push the cells towards anaerobic energy production, when needed. In our model systems, we did not observe changes in any of the AMPK, PDH or PDK subunits that were detected in our untargeted mass spectrometry analysis (see volcano plots below, no PDK subunits were detected in patient fibroblasts). Although this doesn’t directly explain why the cells have an impaired ability to switch their energetic emphasis, it does possibly explain why the switch did not occur de facto.

      Author response image 2.

      Reviewer #2 (Public Review):

      (1) The authors claim in the abstract, the introduction, and the discussion that TIMM50 and the TIM23 translocase might not be relevant for mitochondrial protein import in mammals. This is misleading and certainly wrong!!!

      Indeed, it was not in our intention to claim that the TIM23 complex might not be relevant. We have now rewritten the relevant parts to convey the correct message:

      Abstract – 

      Line 25 - “Strikingly, TIMM50 deficiency had no impact on the steady state levels of most of its putative substrates, suggesting that even low levels of a functional TIM23 complex are sufficient to maintain the majority of complex-dependent mitochondrial proteome.”

      Introduction – 

      Line 87 - Surprisingly, functional and physiological analysis points to the possibility that low levels of TIM23 complex core subunits (TIMM50, TIMM17 and TIMM23) are sufficient for maintaining steady-state levels of most presequence-containing proteins. However, the reduced TIM23CORE component levels do affect some critical mitochondrial properties and neuronal activity.

      Discussion – 

      Line 339 – “…surprising, as normal TIM23 complex levels are suggested to be indispensable for the translocation of presequence-containing mitochondrial proteins…”

      Line 344 – “…it is possible that unlike what occurs in yeast, normal levels of mammalian TIMM50 and TIM23 complex are mainly essential for maintaining the steady state levels of intricate complexes/assemblies.”

      Line 396 – “In summary, our results suggest that even low levels of TIMM50 and TIM23CORE components suffice in maintaining the majority of mitochondrial matrix and inner membrane proteome. Nevertheless, reductions in TIMM50 levels led to a decrease of many OXPHOS and MRP complex subunits, which indicates that normal TIMM50 levels might be mainly essential for maintaining the steady state levels and assembly of intricate complex proteins.”

      (1) Homberg B, Rehling P, Cruz-Zaragoza LD. The multifaceted mitochondrial OXA insertase. Trends Cell Biol. 2023;33(9):765–72. 

      (2) Carroll J, Altman MC, Fearnley IM, Walker JE. Identification of membrane proteins by tandem mass spectrometry of protein ions. Proc Natl Acad Sci U S A.

      2007;104(36):14330–5. 

      (3) Dekker PJT, Keil P, Rassow J, Maarse AC, Pfanner N, Meijer M. Identification of MIM23, a putative component of the protein import machinery of the mitochondrial inner membrane. FEBS Lett. 1993;330(1):66–70. 

      (4) Trefts E, Shaw RJ. AMPK: restoring metabolic homeostasis over space and time. Mol Cell [Internet]. 2021;81(18):3677–90. Available from:

      https://doi.org/10.1016/j.molcel.2021.08.015

      (5) Zhang S, Hulver MW, McMillan RP, Cline MA, Gilbert ER. The pivotal role of pyruvate dehydrogenase kinases in metabolic flexibility. Nutr Metab. 2014;11(1):1–9.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      We thank the reviewer for his valuable input and careful assessment, which have significantly improved the clarity and rigor of our manuscript.

      Summary:

      Mazer & Yovel 2025 dissect the inverse problem of how echolocators in groups manage to navigate their surroundings despite intense jamming using computational simulations.

      The authors show that despite the 'noisy' sensory environments that echolocating groups present, agents can still access some amount of echo-related information and use it to navigate their local environment. It is known that echolocating bats have strong small and large-scale spatial memory that plays an important role for individuals. The results from this paper also point to the potential importance of an even lower-level, short-term role of memory in the form of echo 'integration' across multiple calls, despite the unpredictability of echo detection in groups. The paper generates a useful basis to think about the mechanisms in echolocating groups for experimental investigations too.

      Strengths:

      (1) The paper builds on biologically well-motivated and parametrised 2D acoustics and sensory simulation setup to investigate the various key parameters of interest

      (2) The 'null-model' of echolocators not being able to tell apart objects & conspecifics while echolocating still shows agents successfully emerge from groups - even though the probability of emergence drops severely in comparison to cognitively more 'capable' agents. This is nonetheless an important result showing the direction-of-arrival of a sound itself is the 'minimum' set of ingredients needed for echolocators navigating their environment.

      (3) The results generate an important basis in unraveling how agents may navigate in sensorially noisy environments with a lot of irrelevant and very few relevant cues.

      (4) The 2D simulation framework is simple and computationally tractable enough to perform multiple runs to investigate many variables - while also remaining true to the aim of the investigation.

      Weaknesses:

      There are a few places in the paper that can be misunderstood or don't provide complete details. Here is a selection:

      (1) Line 61: '... studies have focused on movement algorithms while overlooking the sensory challenges involved' : This statement does not match the recent state of the literature. While the previous models may have had the assumption that all neighbours can be detected, there are models that specifically study the role of limited interaction arising from a potential inability to track all neighbours due to occlusion, and the effect of responding to only one/few neighbours at a time e.g. Bode et al. 2011 R. Soc. Interface, Rosenthal et al. 2015 PNAS, Jhawar et al. 2020 Nature Physics.

      We appreciate the reviewer's comment and the relevant references. We have revised the manuscript accordingly to clarify the distinction between studies that incorporate limited interactions and those that explicitly analyze sensory constraints and interference. We have refined our statement to acknowledge these contributions while maintaining our focus on sensory challenges beyond limited neighbor detection, such as signal degradation, occlusion effects, and multimodal sensory integration (see lines 61-64):

      While collective movement has been extensively studied in various species, including insect swarming, fish schooling, and bird murmuration (Pitcher, Partridge and Wardle, 1976; Partridge, 1982; Strandburg-Peshkin et al., 2013; Pearce et al., 2014; Rosenthal, Twomey, Hartnett, Wu, Couzin, et al., 2015; Bastien and Romanczuk, 2020; Davidson et al., 2021; Aidan, Bleichman and Ayali, 2024), as well as in swarm robotics agents performing tasks such as coordinated navigation and maze-solving (Faria Dias et al., 2021; Youssefi and Rouhani, 2021; Cheraghi, Shahzad and Graffi, 2022), most studies have focused on movement algorithms , often assuming full detection of neighbors (Parrish and Edelstein-Keshet, 1999; Couzin et al., 2002, 2005; Sumpter et al., 2008; Nagy et al., 2010; Bialek et al., 2012; Gautrais et al., 2012; Attanasi et al., 2014). Some models have incorporated limited interaction rules where individuals respond to one or a few neighbors due to sensory constraints (Bode, Franks and Wood, 2011; Jhawar et al., 2020). However, fewer studies explicitly examine how sensory interference, occlusion, and noise shape decision-making in collective systems (Rosenthal et al., 2015).

      (2) The word 'interference' is used loosely places (Line 89: '...took all interference signals...', Line 319: 'spatial interference') - this is confusing as it is not clear whether the authors refer to interference in the physics/acoustics sense, or broadly speaking as a synonym for reflections and/or jamming.

      To improve clarity, we have revised the manuscript to distinguish between different types of interference:

      · Acoustic interference (jamming): Overlapping calls that completely obscure echo detection, preventing bats from perceiving necessary environmental cues.

      · Acoustic interference (masking): Partial reduction in signal clarity due to competing calls.

      · Spatial interference: Physical obstruction by conspecifics affecting movement and navigation.

      We have updated the manuscript to use these terms consistently and explicitly define them in relevant sections (see lines 87-94 and 329-330). This distinction ensures that the reader can differentiate between interference as an acoustic phenomenon and its broader implications in navigation.

      (3) The paper discusses original results without reference to how they were obtained or what was done. The lack of detail here must be considered while interpreting the Discussion e.g. Line 302 ('our model suggests...increasing the call-rate..' - no clear mention of how/where call-rate was varied) & Line 323 '..no benefit beyond a certain level..' - also no clear mention of how/where call-level was manipulated in the simulations.

      All tested parameters, including call rate dynamics and call intensity variations, are detailed in the Methods section and Tables 1 and 2. Specifically:

      · Call Rate Variation: The Inter-Pulse Interval (IPI) was modeled based on documented echolocation behavior, decreasing from 100 msec during the search phase to 35 msec (~28 calls per second) at the end of the approach phase, and to 5 msec (200 calls per second) during the final buzz (see Table 2). This natural variation in call rate was not manually manipulated in the model but emerged from the simulated bat behavior.

      · Call Intensity Variation: The tested call intensity levels (100, 110, 120, 130 dB SPL) are presented in Table 1 under the “Call Level” parameter. The effect of increasing call intensity was analyzed in relation to exit probability, jamming probability, and collision rate. This is now explicitly referenced in the Discussion.

      We have revised the manuscript to explicitly reference these aspects in the Results and Discussion sections.

      Reviewer #2 (Public review):

      We are grateful for the reviewer’s insightful feedback, which has helped us clarify key aspects of our research and strengthen our conclusions.

      This manuscript describes a detailed model of bats flying together through a fixed geometry. The model considers elements that are faithful to both bat biosonar production and reception and the acoustics governing how sound moves in the air and interacts with obstacles. The model also incorporates behavioral patterns observed in bats, like one-dimensional feature following and temporal integration of cognitive maps. From a simulation study of the model and comparison of the results with the literature, the authors gain insight into how often bats may experience destructive interference of their acoustic signals and those of their peers, and how much such interference may actually negatively affect the groups' ability to navigate effectively. The authors use generalized linear models to test the significance of the effects they observe.

      In terms of its strengths, the work relies on a thoughtful and detailed model that faithfully incorporates salient features, such as acoustic elements like the filter for a biological receiver and temporal aggregation as a kind of memory in the system. At the same time, the authors' abstract features are complicating without being expected to give additional insights, as can be seen in the choice of a two-dimensional rather than three-dimensional system. I thought that the level of abstraction in the model was perfect, enough to demonstrate their results without needless details. The results are compelling and interesting, and the authors do a great job discussing them in the context of the biological literature.

      The most notable weakness I found in this work was that some aspects of the model were not entirely clear to me.

      For example, the directionality of the bat's sonar call in relation to its velocity. Are these the same?

      For simplicity, in our model, the head is aligned with the body, therefore the direction of the echolocation beam is the same as the direction of the flight.

      Moreover, call directionality (directivity) is not directly influenced by velocity. Instead, directionality is estimated using the piston model, as described in the Methods section. The directionality is based on the emission frequency and is thus primarily linked to the behavioral phases of the bat, with frequency shifts occurring as the bat transitions from search to approach to buzz phases. During the approach phase, the bat emits calls with higher frequencies, resulting in increased directionality. This is supported by the literature (Jakobsen and Surlykke, 2010; Jakobsen, Brinkløv and Surlykke, 2013). This phase is also associated with a natural reduction in flight speed, which is a well-documented behavioral adaptation in echolocating bats (Jakobsen et al., 2024).

      To clarify this in the manuscript, we have updated the text to explicitly state that directionality follows phase-dependent frequency changes rather than being a direct function of velocity, see lines 460-465.

      If so, what is the difference between phi_target and phi_tx in the model equations?

      represents the angle between the bat and the reflected object (target).

      the angle [rad], between the masking bat and target (from the transmitter’s perspective)

      refers to the angle between the transmitting conspecific and the receiving focal bat, from the transmitter’s point of view.

      represents the angle between the receiving bat and the transmitting bat, from the receiver’s point of view.

      These definitions have been explicitly stated in the revised manuscript to prevent any ambiguity (lines 467-468). Additionally, a Supplementary figure demonstrating the geometrical relations has been added to the manuscript.

      Author response image 1.

      What is a bat's response to colliding with a conspecific (rather than a wall)?

      In nature, minor collisions between bats are common and typically do not result in significant disruptions to flight (Boerma et al., 2019; Roy et al., 2019; Goldstein et al., 2024).Given this, our model does not explicitly simulate the physical impact of a collision event. Instead, during the collision event the bat keeps decreasing its velocity and changing its flight direction until the distance between bats is above the threshold (0.4 m). We assume that the primary cost of such interactions arises from the effort required to avoid collisions, rather than from the collision itself. This assumption aligns with observations of bat behavior in dense flight environments, where individuals prioritize collision avoidance rather than modeling post-collision dynamics.

      From the statistical side, it was not clear if replicate simulations were performed. If they were, which I believe is the right way due to stochasticity in the model, how many replicates were used, and are the standard errors referred to throughout the paper between individuals in the same simulation or between independent simulations, or both?

      The number of repetitions for each scenario is detailed in Table 1, but we included it in a more prominent location in the text for clarity. Specifically, we now state (Lines 274-275):

      "The number of repetitions for each scenario was as follows: 1 bat: 240; 2 bats: 120; 5 bats: 48; 10 bats: 24; 20 bats: 12; 40 bats: 12; 100 bats: 6."

      Regarding the reported standard errors, they are calculated across all individuals within each scenario, without distinguishing between different simulation trials.

      We clarified in the revised text (Lines 534-535 in Statistical Analysis)

      Overall, I found these weaknesses to be superficial and easily remedied by the authors. The authors presented well-reasoned arguments that were supported by their results, and which were used to demonstrate how call interference impacts the collective's roost exit as measured by several variables. As the authors highlight, I think this work is valuable to individuals interested in bat biology and behavior, as well as to applications in engineered multi-agent systems like robotic swarms.

      Reviewer #3 (Public review):

      We sincerely appreciate the reviewer’s thoughtful comments and the time invested in evaluating our work, which have greatly contributed to refining our study.

      We would like to note that in general, our model often simplifies some of the bats’ abilities, under the assumption that if the simulated bats manage to perform this difficult task with simpler mechanisms, real better adapted bats will probably perform even better. This thought strategy will be repeated in several of the answers below.

      Summary:

      The authors describe a model to mimic bat echolocation behavior and flight under high-density conditions and conclude that the problem of acoustic jamming is less severe than previously thought, conflating the success of their simulations (as described in the manuscript) with hard evidence for what real bats are actually doing. The authors base their model on two species of bats that fly at "high densities" (defined by the authors as colony sizes from tens to tens of thousands of individuals and densities of up to 33.3 bats/m2), Pipistrellus kuhli and Rhinopoma microphyllum. This work fits into the broader discussion of bat sensorimotor strategies during collective flight, and simulations are important to try to understand bat behavior, especially given a lack of empirical data. However, I have major concerns about the assumptions of the parameters used for the simulation, which significantly impact both the results of the simulation and the conclusions that can be made from the data. These details are elaborated upon below, along with key recommendations the authors should consider to guide the refinement of the model.

      Strengths:

      This paper carries out a simulation of bat behavior in dense swarms as a way to explain how jamming does not pose a problem in dense groups. Simulations are important when we lack empirical data. The simulation aims to model two different species with different echolocation signals, which is very important when trying to model echolocation behavior. The analyses are fairly systematic in testing all ranges of parameters used and discussing the differential results.

      Weaknesses:

      The justification for how the different foraging phase call types were chosen for different object detection distances in the simulation is unclear. Do these distances match those recorded from empirical studies, and if so, are they identical for both species used in the simulation?

      The distances at which bats transition between echolocation phases are identical for both species in our model (see Table 2). These distances are based on well-documented empirical studies of bat hunting and obstacle avoidance behavior (Griffin, Webster and Michael, 1958; Simmons and Kick, 1983; Schnitzler et al., 1987; Kalko, 1995; Hiryu et al., 2008; Vanderelst and Peremans, 2018). These references provide extensive evidence that insectivorous bats systematically adjust their echolocation calls in response to object proximity, following the characteristic phases of search, approach, and buzz.

      To improve clarity, we have updated the text to explicitly state that the phase transition distances are empirically grounded and apply equally to both modeled species (lines 430-447).

      What reasoning do the authors have for a bat using the same call characteristics to detect a cave wall as they would for detecting a small insect?

      In echolocating bats, call parameters are primarily shaped by the target distance and echo strength. Accordingly, there is little difference in call structure between prey capture and obstacles-related maneuvers, aside from intensity adjustments based on target strength (Hagino et al., 2007; Hiryu et al., 2008; Surlykke, Ghose and Moss, 2009; Kothari et al., 2014). In our study, due to the dense cave environment, the bats are found to operate in the approach phase nearly all the time, which is consistent with natural cave emergence, where they are navigating through a cluttered environment rather than engaging in open-space search. For one of the species (Rhinopoma M.), we also have empirical recordings of individuals flying under similar conditions (Goldstein et al., 2024). Our model was designed to remain as simple as possible while relying on conservative assumptions that may underestimate bat performance. If, in reality, bats fine-tune their echolocation calls even earlier or more precisely during navigation than assumed, our model would still conservatively reflect their actual capabilities.

      We actually used logarithmically frequency modulated (FM) chirps, generated using the MATLAB built-in function chirp(t, f0, t1, f1, 'logarithmic'). This method aligns with the nonlinear FM characteristics of Pipistrellus kuhlii (PK) and Rhinopoma microphyllum (RM) and provides a realistic approximation of their echolocation signals. We acknowledge that this was not sufficiently emphasized in the original text, and we have now explicitly highlighted this in the revised version to ensure clarity (sell Lines 447-449 in Methods).

      The two species modeled have different calls. In particular, the bandwidth varies by a factor of 10, meaning the species' sonars will have different spatial resolutions. Range resolution is about 10x better for PK compared to RM, but the authors appear to use the same thresholds for "correct detection" for both, which doesn't seem appropriate.

      The detection process in our model is based on Saillant’s method using a filter bank, as detailed in the paper (Saillant et al., 1993; Neretti et al., 2003; Sanderson et al., 2003). This approach inherently incorporates the advantages of a wider bandwidth, meaning that the differences in range resolution between the species are already accounted for within the signal-processing framework. Thus, there is no need to explicitly adjust the model parameters for bandwidth variations, as these effects emerge from the applied method.

      Also, the authors did not mention incorporating/correcting for/exploiting Doppler, which leads me to assume they did not model it.

      The reviewer is correct. To maintain model simplicity, we did not incorporate the Doppler effect or its impact on echolocation. The exclusion of Doppler effects was based on the assumption that while Doppler shifts can influence frequency perception, their impact on jamming and overall navigation performance is minor within the modelled context.

      The maximal Doppler shifts expected for the bats in this scenario are of ~ 1kHz. These shifts would be applied variably across signals due to the semi-random relative velocities between bats, leading to a mixed effect on frequency changes. This variability would likely result in an overall reduction in jamming rather than exacerbating it, aligning with our previous statement that our model may overestimate the severity of acoustic interference. Such Doppler shifts would result in errors of 2-4 cm in localization (i.e., 200-400 micro-seconds) (Boonman, Parsons and Jones, 2003). 

      We have now explicitly highlighted this in the revised version (see Lines 468-470).

      The success of the simulation may very well be due to variation in the calls of the bats, which ironically enough demonstrates the importance of a jamming avoidance response in dense flight. This explains why the performance of the simulation falls when bats are not able to distinguish their own echoes from other signals. For example, in Figure C2, there are calls that are labeled as conspecific calls and have markedly shorter durations and wider bandwidths than others. These three phases for call types used by the authors may be responsible for some (or most) of the performance of the model since the correlation between different call types is unlikely to exceed the detection threshold. But it turns out this variation in and of itself is what a jamming avoidance response may consist of. So, in essence, the authors are incorporating a jamming avoidance response into their simulation.

      We fully agree that the natural variations in call design between the phases contribute significantly to interference reduction (see our discussion in a previous paper in Mazar & Yovel, 2020). However, we emphasize that this cannot be classified as a Jamming Avoidance Response (JAR). In our model, bats respond only to the physical presence of objects and not to the acoustic environment or interference itself. There is no active or adaptive adjustment of call design to minimize jamming beyond the natural phase-dependent variations in call structure. Therefore, while variation in call types does inherently reduce interference, this effect emerges passively from the modeled behavior rather than as an intentional strategy to avoid jamming.

      The authors claim that integration over multiple pings (though I was not able to determine the specifics of this integration algorithm) reduces the masking problem. Indeed, it should: if you have two chances at detection, you've effectively increased your SNR by 3dB.

      The reviewer is correct. Indeed, integration over multiple calls improves signal-to-noise ratio (SNR), effectively increasing it by approximately 3 dB per doubling of observations. The specifics of the integration algorithm are detailed in the Methods section, where we describe how sensory information is aggregated across multiple time steps to enhance detection reliability.

      They also claim - although it is almost an afterthought - that integration dramatically reduces the degradation caused by false echoes. This also makes sense: from one ping to the next, the bat's own echo delays will correlate extremely well with the bat's flight path. Echo delays due to conspecifics will jump around kind of randomly. However, the main concern is regarding the time interval and number of pings of the integration, especially in the context of the bat's flight speed. The authors say that a 1s integration interval (5-10 pings) dramatically reduces jamming probability and echo confusion. This number of pings isn't very high, and it occurs over a time interval during which the bat has moved 5-10m. This distance is large compared to the 0.4m distance-to-obstacle that triggers an evasive maneuver from the bat, so integration should produce a latency in navigation that significantly hinders the ability to avoid obstacles. Can the authors provide statistics that describe this latency, and discussion about why it doesn't seem to be a problem?

      As described in the Methods section, the bat’s collision avoidance response does not solely rely on the integration process. Instead, the model incorporates real-time echoes from the last calls, which are used independently of the integration process for immediate obstacle avoidance maneuvers. This ensures that bats can react to nearby obstacles without being hindered by the integration latency. The slower integration on the other hand is used for clustering, outlier removal and estimation wall directions to support the pathfinding process, as illustrated in Supplementary Figure 1.

      Additionally, our model assumes that bats store the physical positions of echoes in an allocentric coordinate system (x-y). The integration occurs after transforming these detections from a local relative reference frame to a global spatial representation. This allows for stable environmental mapping while maintaining responsiveness to immediate changes in the bat’s surroundings.

      See lines 518-523 in the revied version.

      The authors are using a 2D simulation, but this very much simplifies the challenge of a 3D navigation task, and there is an explanation as to why this is appropriate. Bat densities and bat behavior are discussed per unit area when realistically it should be per unit volume. In fact, the authors reference studies to justify the densities used in the simulation, but these studies were done in a 3D world. If the authors have justification for why it is realistic to model a 3D world in a 2D simulation, I encourage them to provide references justifying this approach.

      We acknowledge that this is a simplification; however, from an echolocation perspective, a 2D framework represents a worst-case scenario in terms of bat densities and maneuverability:

      · Higher Effective Density: A 2D model forces all bats into a single plane rather than distributing them through a 3D volume, increasing the likelihood of overlap in calls and echoes and making jamming more severe. As described in the text: the average distance to the nearest bat in our simulation is 0.27m (with 100 bats), whereas reported distances in very dense colonies are 0.5m, as observed in Myotis grisescens and Tadarida brasiliensis (Fujioka et al., 2021; Sabol and Hudson, 1995; Betke et al., 2008; Gillam et al, 2010)

      · Reduced Maneuverability: In 3D space, bats can use vertical movement to avoid obstacles and conspecifics. A 2D constraint eliminates this degree of freedom, increasing collision risk and limiting escape options.

      Thus, our 2D model provides a conservative difficult test case, ensuring that our findings are valid under conditions where jamming and collision risks are maximized. Additionally, the 2D framework is computationally efficient, allowing us to perform multiple simulation runs to explore a broad parameter space and systematically test the impact of different variables.

      To address the reviewer’s concern, we have clarified this justification in the revised text and will provide supporting references where applicable: (see Methods lines 407-412)

      The focus on "masking" (which appears to be just in-band noise), especially relative to the problem of misassigned echoes, is concerning. If the bat calls are all the same waveform (downsweep linear FM of some duration, I assume - it's not clear from the text), false echoes would be a major problem. Masking, as the authors define it, just reduces SNR. This reduction is something like sqrt(N), where N is the number of conspecifics whose echoes are audible to the bat, so this allows the detection threshold to be set lower, increasing the probability that a bat's echo will exceed a detection threshold. False echoes present a very different problem. They do not reduce SNR per se, but rather they cause spurious threshold excursions (N of them!) that the bat cannot help but interpret as obstacle detection. I would argue that in dense groups the mis-assignment problem is much more important than the SNR problem.

      There is substantial literature supporting the assumption that bats can recognize their own echoes and distinguish them from conspecific signals (Schnitzler and Bioscience, 2001‏; Kazial, Burnett and Masters, 2001; Burnett and Masters, 2002; Kazial, Kenny and Burnett, 2008; Chili, Xian and Moss, 2009; Yovel et al., 2009; Beetz and Hechavarría, 2022). However, we acknowledge that false echoes may present a major challenge in dense groups. To address this, we explicitly tested the impact of the self-echo identification assumption in our study see Results Figure 4: The impact of confusion on performance, and lines 345-355 in the Discussion.

      Furthermore, we examined a full confusion scenario, where all reflected echoes from conspecifics were misinterpreted as obstacle reflections (i.e., 100% confusion). Our results show that this significantly degrades navigation performance, supporting the argument that echo misassignment is a critical issue. However, we also explored a simple mitigation strategy based on temporal integration with outlier rejection, which provided some improvement in performance. This suggests that real bats may possess additional mechanisms to enhance self-echo identification and reduce false detections. See lines XX in the manuscript for further discussion.

      The criteria set for flight behavior (lines 393-406) are not justified with any empirical evidence of the flight behavior of wild bats in collective flight. How did the authors determine the avoidance distances? Also, what is the justification for the time limit of 15 seconds to emerge from the opening? Instead of an exit probability, why not instead use a time criterion, similar to "How long does it take X% of bats to exit?"

      While we acknowledge that wild bats may employ more complex behaviors for collision avoidance, we chose to implement a simplified decision-making rule in our model to maintain computational tractability.

      The avoidance distances (1.5 m from walls and 0.4 m from other bats) were selected as internal parameters to ensure coherent flight trajectories while maintaining a reasonable collision rate. These distances provide a balance between maneuverability and stability, preventing erratic flight patterns while still enabling effective obstacle avoidance. In the revised paper, we have added supplementary figures illustrating the effect of model parameters on performance, specifically focusing on the avoidance distance.

      The 15-second exit limit was determined as described in the text (Lines 403-404): “A 15-second window was chosen because it is approximately twice the average exit time for 40 bats and allows for a second corrective maneuver if needed.” In other words, it allowed each bat to circle the ‘cave’ twice to exit even in the most crowded environment. This threshold was set to keep simulation time reasonable while allowing sufficient time for most bats to exit successfully.

      We acknowledge that the alternative approach suggested by the reviewer—measuring the time taken for a certain percentage of bats to exit—is also valid. However, in our model, some outlier bats fail to exit and continue flying for many minutes, Such simulations would lead to excessive simulation times making it difficult to generate repetitions and not teaching us much – they usually resulted from the bat slightly missing the opening (see video S1. Our chosen approach ensures practical runtime constraints while still capturing relevant performance metrics.

      What is the empirical justification for the 1-10 calls used for integration?

      The "average exit time for 40 bats" is also confusing and not well explained. Was this determined empirically? From the simulation? If the latter, what are the conditions? Does it include masking, no masking, or which species?

      Previous studies have demonstrated that bats integrate acoustic information received sequentially over several echolocation calls (2-15), effectively constructing an auditory scene in complex environments (Ulanovsky and Moss, 2008; Chili, Xian and Moss, 2009; Moss and Surlykke, 2010; Yovel and Ulanovsky, 2017; Salles, Diebold and Moss, 2020). Additionally, bats are known to produce echolocation sound groups when spatiotemporal localization demands are high (Kothari et al., 2014). Studies have documented call sequences ranging from 2 to 15 grouped calls (Moss et al., 2010), and it has been hypothesized that grouping facilitates echo segregation.

      We did not use a single integration window - we tested integration sizes between 1 and 10 calls and presented the results in Figure 3A. This range was chosen based on prior empirical findings and to explore how different levels of temporal aggregation impact navigation performance. Indeed, the results showed that the performance levels between 5-10 calls integration window (Figure 3A)

      Regarding the average exit time for 40 bats, this value was determined from our simulations, where it represents the mean time for successful exits under standard conditions with masking.

      We have revised the text to clarify these details see, lines 466.

      References:

      Aidan, Y., Bleichman, I. and Ayali, A. (2024) ‘Pausing to swarm: locust intermittent motion is instrumental for swarming-related visual processing’, Biology letters, 20(2), p. 20230468. Available at: https://doi.org/10.1098/rsbl.2023.0468.

      Attanasi, A. et al. (2014) ‘Collective Behaviour without Collective Order in Wild Swarms of Midges’. Edited by T. Vicsek, 10(7). Available at: https://doi.org/10.1371/journal.pcbi.1003697.

      Bastien, R. and Romanczuk, P. (2020) ‘A model of collective behavior based purely on vision’, Science Advances, 6(6). Available at: https://doi.org/10.1126/sciadv.aay0792.

      Beetz, M.J. and Hechavarría, J.C. (2022) ‘Neural Processing of Naturalistic Echolocation Signals in Bats’, Frontiers in Neural Circuits, 16, p. 899370. Available at: https://doi.org/10.3389/FNCIR.2022.899370/BIBTEX.

      Betke, M. et al. (2008) ‘Thermal Imaging Reveals Significantly Smaller Brazilian Free-Tailed Bat Colonies Than Previously Estimated’, Journal of Mammalogy, 89(1), pp. 18–24. Available at: https://doi.org/10.1644/07-MAMM-A-011.1.

      Bialek, W. et al. (2012) ‘Statistical mechanics for natural flocks of birds’, Proceedings of the National Academy of Sciences, 109(13), pp. 4786–4791. Available at: https://doi.org/10.1073/PNAS.1118633109.

      Bode, N.W.F., Franks, D.W. and Wood, A.J. (2011) ‘Limited interactions in flocks: Relating model simulations to empirical data’, Journal of the Royal Society Interface, 8(55), pp. 301–304. Available at: https://doi.org/10.1098/RSIF.2010.0397.

      Boerma, D.B. et al. (2019) ‘Wings as inertial appendages: How bats recover from aerial stumbles’, Journal of Experimental Biology, 222(20). Available at: https://doi.org/10.1242/JEB.204255/VIDEO-3.

      Boonman, A.M., Parsons, S. and Jones, G. (2003) ‘The influence of flight speed on the ranging performance of bats using frequency modulated echolocation pulses’, The Journal of the Acoustical Society of America, 113(1), p. 617. Available at: https://doi.org/10.1121/1.1528175.

      Burnett, S.C. and Masters, W.M. (2002) ‘Identifying Bats Using Computerized Analysis and Artificial Neural Networks’, North American Symposium on Bat Research, 9.

      Cheraghi, A.R., Shahzad, S. and Graffi, K. (2022) ‘Past, Present, and Future of Swarm Robotics’, in Lecture Notes in Networks and Systems. Available at: https://doi.org/10.1007/978-3-030-82199-9_13.

      Chili, C., Xian, W. and Moss, C.F. (2009) ‘Adaptive echolocation behavior in bats for the analysis of auditory scenes’, Journal of Experimental Biology, 212(9), pp. 1392–1404. Available at: https://doi.org/10.1242/jeb.027045.

      Couzin, I.D. et al. (2002) ‘Collective Memory and Spatial Sorting in Animal Groups’, Journal of Theoretical Biology, 218(1), pp. 1–11. Available at: https://doi.org/10.1006/jtbi.2002.3065.

      Couzin, I.D. et al. (2005) ‘Effective leadership and decision-making in animal groups on the move’, Nature, 433(7025), pp. 513–516. Available at: https://doi.org/10.1038/nature03236.

      Davidson, J.D. et al. (2021) ‘Collective detection based on visual information in animal groups’, Journal of the Royal Society, 18(180), p. 2021.02.18.431380. Available at: https://doi.org/10.1098/rsif.2021.0142.

      Faria Dias, P.G. et al. (2021) ‘Swarm robotics: A perspective on the latest reviewed concepts and applications’, Sensors. Available at: https://doi.org/10.3390/s21062062.

      Fujioka, E. et al. (2021) ‘Three-Dimensional Trajectory Construction and Observation of Group Behavior of Wild Bats During Cave Emergence’, Journal of Robotics and Mechatronics, 33(3), pp. 556–563. Available at: https://doi.org/10.20965/jrm.2021.p0556.

      Gautrais, J. et al. (2012) ‘Deciphering Interactions in Moving Animal Groups’, PLOS Computational Biology, 8(9), p. e1002678. Available at: https://doi.org/10.1371/JOURNAL.PCBI.1002678.

      Gillam, E.H. et al. (2010) ‘Echolocation behavior of Brazilian free-tailed bats during dense emergence flights’, Journal of Mammalogy, 91(4), pp. 967–975. Available at: https://doi.org/10.1644/09-MAMM-A-302.1.

      Goldstein, A. et al. (2024) ‘Collective Sensing – On-Board Recordings Reveal How Bats Maneuver Under Severe 4 Acoustic Interference’, Under Review, pp. 1–25.

      Griffin, D.R., Webster, F.A. and Michael, C.R. (1958) ‘THE ECHOLOCATION OF FLYING INSECTS BY BATS ANIMAL BEHAVIOUR , Viii , 3-4’.

      Hagino, T. et al. (2007) ‘Adaptive SONAR sounds by echolocating bats’, International Symposium on Underwater Technology, UT 2007 - International Workshop on Scientific Use of Submarine Cables and Related Technologies 2007, pp. 647–651. Available at: https://doi.org/10.1109/UT.2007.370829.

      Hiryu, S. et al. (2008) ‘Adaptive echolocation sounds of insectivorous bats, Pipistrellus abramus, during foraging flights in the field’, The Journal of the Acoustical Society of America, 124(2), pp. EL51–EL56. Available at: https://doi.org/10.1121/1.2947629.

      Jakobsen, L. et al. (2024) ‘Velocity as an overlooked driver in the echolocation behavior of aerial hawking vespertilionid bats’. Available at: https://doi.org/10.1016/j.cub.2024.12.042.

      Jakobsen, L., Brinkløv, S. and Surlykke, A. (2013) ‘Intensity and directionality of bat echolocation signals’, Frontiers in Physiology, 4 APR(April), pp. 1–9. Available at: https://doi.org/10.3389/fphys.2013.00089.

      Jakobsen, L. and Surlykke, A. (2010) ‘Vespertilionid bats control the width of their biosonar sound beam dynamically during prey pursuit’, 107(31). Available at: https://doi.org/10.1073/pnas.1006630107.

      Jhawar, J. et al. (2020) ‘Noise-induced schooling of fish’, Nature Physics 2020 16:4, 16(4), pp. 488–493. Available at: https://doi.org/10.1038/s41567-020-0787-y.

      Kalko, E.K. V. (1995) ‘Insect pursuit, prey capture and echolocation in pipistrelle bats (Microchirptera)’, Animal Behaviour, 50(4), pp. 861–880.

      Kazial, K.A., Burnett, S.C. and Masters, W.M. (2001) ‘ Individual and Group Variation in Echolocation Calls of Big Brown Bats, Eptesicus Fuscus (Chiroptera: Vespertilionidae) ’, Journal of Mammalogy, 82(2), pp. 339–351. Available at: https://doi.org/10.1644/1545-1542(2001)082<0339:iagvie>2.0.co;2.

      Kazial, K.A., Kenny, T.L. and Burnett, S.C. (2008) ‘Little brown bats (Myotis lucifugus) recognize individual identity of conspecifics using sonar calls’, Ethology, 114(5), pp. 469–478. Available at: https://doi.org/10.1111/j.1439-0310.2008.01483.x.

      Kothari, N.B. et al. (2014) ‘Timing matters: Sonar call groups facilitate target localization in bats’, Frontiers in Physiology, 5 MAY. Available at: https://doi.org/10.3389/fphys.2014.00168.

      Moss, C.F. and Surlykke, A. (2010) ‘Probing the natural scene by echolocation in bats’, Frontiers in Behavioral Neuroscience. Available at: https://doi.org/10.3389/fnbeh.2010.00033.

      Nagy, M. et al. (2010) ‘Hierarchical group dynamics in pigeon flocks’, Nature 2010 464:7290, 464(7290), pp. 890–893. Available at: https://doi.org/10.1038/nature08891.

      Neretti, N. et al. (2003) ‘Time-frequency model for echo-delay resolution in wideband biosonar’, The Journal of the Acoustical Society of America, 113(4), pp. 2137–2145. Available at: https://doi.org/10.1121/1.1554693.

      Parrish, J.K. and Edelstein-Keshet, L. (1999) ‘Complexity, Pattern, and Evolutionary Trade-Offs in Animal Aggregation’, Science, 284(5411), pp. 99–101. Available at: https://doi.org/10.1126/SCIENCE.284.5411.99.

      Partridge, B.L. (1982) ‘The Structure and Function of Fish Schools’, 246(6), pp. 114–123. Available at: https://doi.org/10.2307/24966618.

      Pearce, D.J.G. et al. (2014) ‘Role of projection in the control of bird flocks’, Proceedings of the National Academy of Sciences of the United States of America, 111(29), pp. 10422–10426. Available at: https://doi.org/10.1073/pnas.1402202111.

      Pitcher, T.J., Partridge, B.L. and Wardle, C.S. (1976) ‘A blind fish can school’, Science, 194(4268), pp. 963–965. Available at: https://doi.org/10.1126/science.982056.

      Rosenthal, S.B., Twomey, C.R., Hartnett, A.T., Wu, H.S., Couzin, I.D., et al. (2015) ‘Revealing the hidden networks of interaction in mobile animal groups allows prediction of complex behavioral contagion’, Proceedings of the National Academy of Sciences of the United States of America, 112(15), pp. 4690–4695. Available at: https://doi.org/10.1073/pnas.1420068112.

      Rosenthal, S.B., Twomey, C.R., Hartnett, A.T., Wu, H.S. and Couzin, I.D. (2015) ‘Revealing the hidden networks of interaction in mobile animal groups allows prediction of complex behavioral contagion’, Proceedings of the National Academy of Sciences of the United States of America, 112(15), pp. 4690–4695. Available at: https://doi.org/10.1073/PNAS.1420068112/-/DCSUPPLEMENTAL/PNAS.1420068112.SAPP.PDF.

      Roy, S. et al. (2019) ‘Extracting interactions between flying bat pairs using model-free methods’, Entropy, 21(1). Available at: https://doi.org/10.3390/e21010042.

      Sabol, B.M. and Hudson, M.K. (1995) ‘Technique using thermal infrared-imaging for estimating populations of gray bats’, Journal of Mammalogy, 76(4). Available at: https://doi.org/10.2307/1382618.

      Saillant, P.A. et al. (1993) ‘A computational model of echo processing and acoustic imaging in frequency- modulated echolocating bats: The spectrogram correlation and transformation receiver’, The Journal of the Acoustical Society of America, 94(5). Available at: https://doi.org/10.1121/1.407353.

      Salles, A., Diebold, C.A. and Moss, C.F. (2020) ‘Echolocating bats accumulate information from acoustic snapshots to predict auditory object motion’, Proceedings of the National Academy of Sciences of the United States of America, 117(46), pp. 29229–29238. Available at: https://doi.org/10.1073/PNAS.2011719117/SUPPL_FILE/PNAS.2011719117.SAPP.PDF.

      Sanderson, M.I. et al. (2003) ‘Evaluation of an auditory model for echo delay accuracy in wideband biosonar’, The Journal of the Acoustical Society of America, 114(3), pp. 1648–1659. Available at: https://doi.org/10.1121/1.1598195.

      Schnitzler, H., Bioscience, E.K.- and 2001‏, undefined (no date) ‘Echolocation by insect-eating bats: we define four distinct functional groups of bats and find differences in signal structure that correlate with the typical echolocation ‏’, academic.oup.com‏HU Schnitzler, EKV Kalko‏Bioscience, 2001‏•academic.oup.com‏ [Preprint]. Available at: https://academic.oup.com/bioscience/article-abstract/51/7/557/268230 (Accessed: 17 March 2025).

      Schnitzler, H.-U. et al. (1987) ‘The echolocation and hunting behavior of the bat,Pipistrellus kuhli’, Journal of Comparative Physiology A, 161(2), pp. 267–274. Available at: https://doi.org/10.1007/BF00615246.

      Simmons, J.A. and Kick, S.A. (1983) ‘Interception of Flying Insects by Bats’, Neuroethology and Behavioral Physiology, pp. 267–279. Available at: https://doi.org/10.1007/978-3-642-69271-0_20.

      Strandburg-Peshkin, A. et al. (2013) ‘Visual sensory networks and effective information transfer in animal groups’, Current Biology. Cell Press. Available at: https://doi.org/10.1016/j.cub.2013.07.059.

      Sumpter, D.J.T. et al. (2008) ‘Consensus Decision Making by Fish’, Current Biology, 18(22), pp. 1773–1777. Available at: https://doi.org/10.1016/J.CUB.2008.09.064.

      Surlykke, A., Ghose, K. and Moss, C.F. (2009) ‘Acoustic scanning of natural scenes by echolocation in the big brown bat, Eptesicus fuscus’, Journal of Experimental Biology, 212(7), pp. 1011–1020. Available at: https://doi.org/10.1242/JEB.024620.

      Theriault, D.H. et al. (no date) ‘Reconstruction and analysis of 3D trajectories of Brazilian free-tailed bats in flight‏’, cs-web.bu.edu‏ [Preprint]. Available at: https://cs-web.bu.edu/faculty/betke/papers/2010-027-3d-bat-trajectories.pdf (Accessed: 4 May 2023).

      Ulanovsky, N. and Moss, C.F. (2008) ‘What the bat’s voice tells the bat’s brain’, Proceedings of the National Academy of Sciences of the United States of America, 105(25), pp. 8491–8498. Available at: https://doi.org/10.1073/pnas.0703550105.

      Vanderelst, D. and Peremans, H. (2018) ‘Modeling bat prey capture in echolocating bats : The feasibility of reactive pursuit’, Journal of theoretical biology, 456, pp. 305–314.

      Youssefi, K.A.R. and Rouhani, M. (2021) ‘Swarm intelligence based robotic search in unknown maze-like environments’, Expert Systems with Applications, 178. Available at: https://doi.org/10.1016/j.eswa.2021.114907.

      Yovel, Y. et al. (2009) ‘The voice of bats: How greater mouse-eared bats recognize individuals based on their echolocation calls’, PLoS Computational Biology, 5(6). Available at: https://doi.org/10.1371/journal.pcbi.1000400.

      Yovel, Y. and Ulanovsky, N. (2017) ‘Bat Navigation’, The Curated Reference Collection in Neuroscience and Biobehavioral Psychology, pp. 333–345. Available at: https://doi.org/10.1016/B978-0-12-809324-5.21031-6.

    1. Author response:

      We thank the reviewers for their thorough evaluation and constructive feedback on our manuscript.

      We think that their valuable suggestions will strengthen the manuscript and help us clarify several important points.

      All reviewers acknowledged the importance of our theoretical results and network classification in making pattern formation analysis a more tractable problem. At the same time, they have also raised a number of important concerns that we shall carefully consider.

      A. A major clarification that the reviewers found important concerns the definition of non-trivial pattern transformations and its generalization to higher dimensions. In this regard, the reviewers’ comments are:

      Reviewer #1:

      (on non-trivial pattern transformations):

      (3) All modelling is confined to one spatial dimension, and the very definition of a "non-trivial" transformation is framed in terms of peak positions along a line, which clearly must be reformulated for higher dimensions. It's well-known that diffusions in 1, 2, and 3 dimensions are also dramatically different, so the relevance of the three-class taxonomy to real multicellular tissues remains unclear, or at least should be explained in more detail. Reviewer #2 (on non-trivial pattern transformations):

      (5) The definition of non-trivial pattern formation is provided only in the Supplementary Information, despite its central importance for interpreting the main results. It would significantly improve clarity if this definition were included and explained in the main text. Additionally, it remains unclear how the definition is consistently applied across the different initial conditions. In particular, the authors should clarify how slope-based measures are determined for both the random noise and sharp peak/step function initial states. Furthermore, the authors do not specify how the sign function is evaluated at zero. If the standard mathematical definition sgn(0)=0 is used, then even a simple widening of a peak could fulfill the criterion for nontrivial pattern transformation.

      We agree with Reviewer #2 that including a more detailed definition of non-trivial pattern transformation in the main text would enhance the clarity of the paper. The one-dimensional (1D) definition currently provided in the Supplementary Information was chosen because all computations presented therein involve exclusively one-dimensional patterns. However, we acknowledge that this definition, as it was, did not have a totally unambiguous generalization  to higher dimensions. Therefore, in a revised version of the manuscript, we will incorporate an expanded definition applicable to higher-dimensional cases.

      This general definition of a non-trivial pattern transformation should make no reference to the sign of spatial derivatives of either the initial or resulting patterns. Specifically, a pattern transformation is considered non-trivial if it satisfies the following criteria:

      - It is heterogeneous: The resulting pattern is heterogeneous in space.

      - It is rearranging: The arrangement of critical points (i.e. peaks, valleys and saddle points in a gene product concentration) along the domain in the resulting pattern of a gene product is different to the arrangement of critical points in its initial pattern. This includes the emergence of new critical points, the disappearance of existing ones, or the spatial displacement of critical points from one location to another.

      - It is non-replicating: The spatial arrangement of critical points in the pattern of one gene product must differ from that of any other upstream gene product.

      Nonetheless, our two initial patterns are spatially discontinuous functions: in homogeneous initial patterns, the white noise is discontinuous by definition; and for the spike and spike+homogeneous initial patterns, we use sharp spikes defined by the rectangular function, which is discontinuous at the spike boundaries. Therefore, the aforementioned definition should be supplemented with the following two ad hoc assumptions:

      - Homogeneous initial patterns do not comprise any critical point. White noise in this type of initial patterns represents small thermodynamic fluctuations around the steady state and, for the purpose of pattern transformation, this is equivalent to a constant concentration along the domain.

      - Spike and spike+homogeneous initial patterns each contain a single critical point located at the center of the spike. The sharp spikes, modeled using the rectangular function, serve as a theoretical idealization to facilitate mathematical analysis. Once diffusion begins to act, these sharp boundaries are smoothed into differentiable gradients, maintaining a unique critical point at the center of the initial spike, which is the most relevant information for pattern transformation.

      Finally, it is worth recalling that our gene network classification is fundamentally based on an analysis of the dispersion relation associated with the gene network, and the construction of this dispersion relation is independent of the spatial dimensionality of the domain (i.e. it does not require assuming any specific number of dimensions). The fact that the description of this dispersion relation was in the SI may have been non-ideal for the understandability of the article and will, consequently, be moved to the main text in an upcoming version of the article. Thus, the gene networks that can lead to pattern transformation are the same in 1D, 2D or 3D. As for the resulting patterns, the broad description we provide also applies to any number of dimensions; these would be periodic, non periodic as in the amplified noise patterns or non periodic as in the hierarchic networks. For the latter notice that, except for boundary effects that we later discuss, the spike initial condition is radially symmetric and thus, the patterns resulting from it will also be radially symmetric. We will make this point more explicit in a revised version of the article, especially since, as suggested, this important portion of the Supplementary Information will be incorporated into the main text.

      Reviewer 2 suggests that with our definition of non-trivial pattern transformation, the simple widening of a concentration peak would constitute a non-trivial pattern transformation. This is not the case, as already shown in the figures as a example, since in a widening there is no change in the position of the critical point. A different situation applies if a wide and completely flat concentration peak (i.e. a plateau) forms. As we will explain in the coming version this is not possible because of requirement R5.

      We think that this clarification of the definition of non-trivial pattern transformation will also help clarify the next point (B below) since it would make it clearer that this article does not intend to explain which specific resulting pattern would arise from any given gene network.

      B. The main concern among these relates to the validity of our linearization of the model equations and the extension of the results obtained for the linear system to the fully nonlinear system. In this regard, the reviewers’ comments are:

      Reviewer #1:

      (on linearization):

      (2) A central step in the model formulation is the linearisation of the reaction term around a homogeneous steady state; higher-order kinetics, including ubiquitous bimolecular sinks such as A + B → AB, are simply collapsed into the Jacobian without any stated amplitude bound on the perturbations. Because the manuscript never analyses how far this assumption can be relaxed, the robustness of the three-class taxonomy under realistic nonlinear reactions or large spike amplitudes remains uncertain.

      Reviewer #2:

      (on linearization):

      (2) Most of the proofs presented in the Supplementary Information rely on linearized versions of the governing equations, and it remains unclear how these results extend to the fully nonlinear system. We are concerned that the generality of the conclusions drawn from the linear analysis may be overstated in the main text. For example, in Section S3, the authors introduce the concept of dynamic equivalence of transitive chains (Proposition S3.1) and intracellular transitive M-branching (Proposition S3.2), which pertains to the system's steady-state behavior. However, the proof is based solely on the linearized equations, without additional justification for why the result should hold in the presence of nonlinearities. Moreover, the linearized system is used to analyze the response to a "spike initial pattern of arbitrary height C" (SI Chapter S5.1), yet it is not clear how conclusions derived from the linear regime can be valid for large perturbations, where nonlinear effects are expected to play a significant role. We encourage the authors to clarify the assumptions under which the linearized analysis remains valid and to discuss the potential limitations of applying these results to the nonlinear regime.

      In this article, we address two main questions: first, which gene network topologies can give rise to non-trivial pattern transformations; and second, which broad types of resulting patterns can these gene network topologies give rise to resulting pattern. Thus, we are not intending to explain which exact resulting patterns would arise from any given gene network (i.e. a gene network topology with specific functions and interaction strengths or weights), a question for which non-linearities do indeed matter.

      For most known gene regulatory networks, available empirical information is typically limited to the nature of gene product regulations -indicating whether they act as activators or inhibitors- while details about the specific functional form of these regulations are rare. For instance, given two gene products, i and j, the network may indicate that i acts as an activator of j, implying that the concentration of j increases with that of i. However, this increase could follow a variety of functional forms: it may be quadratic (e.g., ), cubic (e.g., ), or any other function f j(gi). As we explain in the description of our model, we restrict our study to functions with a monotonicity constraint: higher concentrations of i lead to increased production of j (i.e., ).  In other words, a given gene interaction is always inhibitory or activatory, it does not change of sign. This monotonicity constraint corresponds to requirement (R5) in our main text. This requirement it is based on the biologically plausible idea that the complexity of gene regulation in development stems more from the topology of gene networks than from the complexity of the regulation by which a gene product may regulate another (i.e. we use simple monotonic functions).

      Question 1: A critical part to understand question 1 is in the dispersion relation that was explained in SI. From the reviewers’ comments it is clear that having this crucial part in the main text of an upcoming version of the article would improve understandability, specially for question 1.

      In brief, any pattern transformation requires the initial pattern to change. The trigger of such change is a change in the concentration of some gene product, either conceptualized as a noise fluctuation (in the homogeneous initial pattern) or a regulated change in a specific point (in the spike initial pattern). Mathematically, both can be conceptualized as perturbations and, for pattern transformation to be possible, such perturbation should grow so that the initial pattern becomes unstable and can change to another resulting pattern.

      If the perturbation is small, one can use the standard linear perturbation analysis in S6.2 of our Supplementary Information. In other words, the linear analysis is enough to ascertain if a small perturbation would grow or not. A gene network in which this will not happen would be unable to lead to pattern transformation, whichever the nonlinear part of f(g). In that sense, the linear approximation provides a necessary condition that any gene network needs to fulfill to lead to pattern transformation.

      However, the linear analysis would not ascertain whether a specific gene network will actually lead to pattern transformation (i.e., the condition is not sufficient). This, as well as the shape of the specific resulting pattern, may actually depend on the non-linear parts too. As we discuss, based on the dispersion relation, and other complementing arguments along the article, we can also get some insights on the possible patterns from the linear approximation alone (question 2). This arguments hold thanks to the imposition of requirements (R1-R5) on function f(g), which prevent strange behaviors stemming from the nonlinear part of the equation.

      The amplitude bound of perturbations mentioned by Reviewer #1 is addressed by requirements (R2) and (R4). Although the solution to the linear system predicts unbounded growth of unstable eigenmodes, the assume functions f(g) on which the nonlinear terms  eventually halt this growth, thereby ensuring the boundedness of solutions as imposed by (R4). This assumption on the nonlinear part is literally requirement R2 on f(g) in the main text.

      The transitive chains and branchings in section S3 of the Supplementary Information mentioned by the Reviewer #2 are topological properties of gene networks and therefore they influence only the linear part of the reaction-diffusion equations. This is why the proofs in that section are based on the linearized equations. We agree that clarifying this point in the text, as suggested by the reviewer, would improve the reader’s understanding of the section.

      Regarding Reviewer #2’s concerns about large perturbations, we acknowledge that the phrasing using “arbitrary height” may be confusing. For the homogeneous initial conditions these perturbations are assumed to be small because they are actually molecular noise (otherwise the initial condition could not be considered homogenous in the classical sense of developmental biology models). In the spike initial conditions in hierarchic networks the perturbation is not necessarily small. For the analysis provided in the SI we indeed assume that the perturbations are small enough for the linear approximation to be possible. Notice, however, that since these networks require an intracellular self-activating loop upstream of the first extracellular signal, the effective perturbation would rapidly grow to a value determined by such loop.

      In general the height of the initial spike does not affect the fact that hierarchic networks can lead to non-trivial pattern transformation. By definition these networks require the secretion of an extracellular signal from the cells in the spike (otherwise no change in gene product concentrations can occur over space). By definition this signal is not produced by any other cells and, thus, its concentration is governed by diffusion from the spike and its production in the cells in the spike. Thus, whichever the initial height of the spike and whichever the non-linearities in f(g), the signal’s concentration would decrease with the distance from the spike. As explained in the main text, this would lead to non-trivial pattern transformations if other general conditions are met. In general, the height of the initial perturbation can affect which specific pattern transformation would arise from a specific gene network but not which gene network topologies can lead to pattern transformation. This will be more clearly stated in an upcoming version of the article. C. In the following, we respond to the remaining concerns raised by the reviewers:

      Reviewer #1:

      (1) The Results section is difficult to follow. Key logical steps and network configurations are described shortly in prose, which constantly require the reader to address either SI or other parts of the text (see numerous links on the requirements R1-R5 listed at the beginning of the paper) to gain minimal understanding. As a result, a scientifically literate but non-specialist reader may struggle to grasp the argument with a reasonable time invested.

      We acknowledge that the current version of the main text may not be as clear as we intended. Initially, we believed that placing the more technical mathematical passages in the Supplementary Information would make the main text more accessible to readers. However, we agree with the reviewer that including some of these computations in the main text could improve clarity. We also believe that adding a summary table outlining all the model’s requirements would further contribute to that goal.

      Reviewer #2:

      (1) We have serious concerns regarding the validity of the simulation results presented in the manuscript. Rather than simulating the full nonlinear system described by Equation (1), the authors base their results on a truncated expansion (Equation S.8.2) that captures only the time evolution of small deviations around a spatially homogeneous steady state. However, it remains unclear how this reduced system is derived from the full equations specifically, which terms are retained or neglected and why- and how the expansion of the nonlinear function can be steady-state independent, as claimed. Additionally, in simulations involving the spike plus homogeneous initial condition, it is not evident -or, where equations are provided, it is not correct- that the assumed global homogeneous background actually corresponds to a steady state of the full dynamics. We elaborate on these concerns in the following:

      We believe there has been a misunderstanding regarding the presentation of the model equations (S8.2) used throughout our simulations. Accordingly, we agree that this relevant section of the Supplementary Information should be rewritten in a revised version of the manuscript to clarify this issue. Below, we address all the concerns raised by the reviewer.

      Equation (S8.2) represents the full nonlinear system described in Equation (1). While we recognize that the model may oversimplify real biological processes, its purpose is to illustrate our general statements about pattern formation rather than to capture any specific or detailed mechanism. In this context, model (S8.2) offers three key advantages for our goals: it allows rapid manipulation of gene network topology simply by modifying the matrix J, making it ideal for illustrating pattern formation across different network classes; it accommodates gene networks of arbitrary size -unlike other models, such as the classical Gierer-Meinhardt model, which are limited to two-element Turing or noise-amplifying networks-; and, due to the simplicity of its nonlinear terms, this model involves relatively few free parameters, facilitating the fine-tuning needed to identify parameter regions where non-trivial pattern transformations occur.

      Indeed, we find that the ability of model (S8.2) to illustrate our results despite having such simple nonlinear terms -bearing in mind that at least some nonlinearity is always necessary for selforganization- strongly supports the claim that the capacity of a gene network to produce pattern transformations is fully determined by the linear part of Equation (1). In this sense, nonlinear terms primarily influence the precise parameter values at which these transformations occur and contribute to shaping specific features of the resulting patterns.

      Model (S8.2) has been successfully employed in pattern formation studies elsewhere in the literature; accordingly, we provide relevant bibliographic references to support its widespread use.

      We believe the misunderstanding arises from our explanation of the biological interpretation of the model. As noted in the accompanying bibliography, the model is based on a general reactiondiffusion mechanism assuming the existence of a steady state. However, this conceptual reactiondiffusion framework is not the same as our Equation (1); rather, it was introduced by the original proponents of the model in the seminal paper cited in our text. In this context, Equation (S8.2) describes small concentration perturbations around that steady state, where the variables represent deviations in concentration relative to the general steady state.

      The aforementioned general steady state corresponds to the trivial equilibrium point g≡0 in equations (S8.2). Consequently, all our simulations based on model (S8.2) start from this steady state, to which we add white noise to generate homogeneous initial patterns or a sharp spike for the two types of spike initial patterns.

      It is also worth noting that Equations (S8.2) represent a non-dimensional model.

      It is assumed that the homogeneous steady states are given by g_i=0 and g_i=c_i, where 1/c_i = \mu_i or \hat{\mu}_i, independently of the specific network structure. However, the basis for this assumption is unclear, especially since some of the functions do not satisfy this condition -for example, f5 as defined below Eq. S8.10.5. Moreover, if g_i=c_i does not correspond to a true steady state, then the time evolution of deviations from this state is not correctly described by Eq. S8.2, as the zeroth-order terms do not vanish in that case.

      From the explanations above, it is important to distinguish two scales in the process: the scale of small perturbations, where equations (S8.2) apply; and the global scale, where the conceptual general reaction-diffusion system operates. Since the specific form of this general system does not affect equations (S8.2), we assume that it follows any of the models cited in the text, which yield a non-zero steady state at .

      In this sense, Equation (S8.2) represent a small concentration deviation of such global system and g(t ,x) is a relative concentration where g≡0 represents the steady-state at are concentrations above , and g<0 are concentrations below .

      As previously mentioned, simulations are performed using Equations (S8.2) on the basis of the equilibrium point g≡0. The result of these simulations is then superimposed on the non-zero steady state and presented in the figures along the article.

      Using the full model instead of the simplified Equations (S8.2) may result in slightly different resulting patterns, but it does not affect the gene network’s ability to produce pattern transformations, nor does it alter the main structural properties of the patterns—for example, the periodic nature of patterns generated by Turing networks.

      Additionally, the equations used contain only linear terms and a cubic degradation term for each species g_i, while neglecting all quadratic terms and cubic terms involving cross-species interactions (i≠j). An explanation for this selective truncation is not provided, and without knowledge of the full equation (f), it is impossible to assess whether this expansion is mathematically justified. If, as suggested in the Supplementary Information, the linear and cubic terms are derived from f, then at the very least, the Jacobian matrix should depend on the background steady-state concentration. However, the equations for the small deviation around a steady state (including the Jacobian matrix) used in the simulations appear to be independent of the particular steady state concentration.

      The Jacobian of Equation (S8.2) is independent of g because g represents a small perturbation around a steady state of a general reaction-diffusion system. Consequently, the matrix J corresponds to the Jacobian of the general system evaluated at that steady state. Evaluating the Jacobian of equations (S8.2) at the equilibrium point g≡0 -which represents the general steady state- recovers the matrix J.

      This is why we believe that the differences observed between the spike-only initial condition and the spike superimposed on a homogeneous background are not due to the initial conditions themselves, but rather result from a modified reaction scheme introduced through a questionable cutoff.

      "In simulations with spike initial patterns, the reference value g≡0 represents an actual concentration of 0 and therefore, we must add to (S8.2) a Heaviside function Φ acting of f (i.e., Φ(f(g))=f(g) if f(g)>0 , Φ(f(g))=0 if f(g){less than or equal to}0 ) to prevent the existence of negative concentrations for any gene product (i.e., g_i<0 for some i )." (SI chapter S8).

      This cutoff alters the dynamics (no inhibition) and introduces a different reaction scheme between the two simulations. The need for this correction may itself reflect either a problem in the original equations (which should fulfill the necessary conditions and prevent negative concentrations (R4 in main text)) or the inappropriateness of using an expanded approximation which assumes independence on the steady state concentration. It is already questionable if the linearized equations with a cubic degradation term are valid for the spike initial conditions (with different background concentration values), as the amplitude of this perturbation seems rather large.

      For homogeneous and spike+homogeneous initial conditions, we interpret equations (S8.2) as small perturbations around a non-zero steady state of a general reaction-diffusion system. For spike-only initial conditions, that steady state is zero. As we mention before, g≡0 will then represent such steady-state of zero concentration, g>0 are positive concentrations of the general system, and g<0 would represent unfeasible negative concentrations of the general system. Therefore, the use of a cutoff function to handle such initial conditions is justified. Moreover, this cutoff function is the same as the one employed in the reference general system cited in our paper.

      We acknowledge that the cutoff influences the simulations and accounts for the differences observed between spike and spike+homogeneous initial conditions. However, this distinction reflects what occurs in real biological systems, which is precisely why we differentiate these two types of initial states. For instance, the emergence of a periodic pattern in a noise-amplifying network depends critically on the formation of regions with concentrations below the steady state near the initial spike. Such regions can form in spike-plus-homogeneous initial patterns but not in spike-only initial patterns, where concentrations below the steady state would correspond to biologically unfeasible negative values.

      Lastly, we note that under the current simulation scheme, it is not possible to meaningfully assess criteria RH2a and RH2b, as they rely on nonlinear interactions that are absent from the implemented dynamics.

      It is explicitly stated in the relevant subsections of Section S7 in the Supplementary Information that, for the simulations involving RH2a and RH2b, the function f(g) in equation (S8.2) is modified by adding an ad hoc quadratic term to enable the assessment of these criteria.

      (3) Several statements in the main text are presented without accompanying proof or sufficient explanation, which makes it difficult to assess their validity. In some cases, the lack of justification raises serious doubts about whether the claims are generally true. Examples are:

      "For the purpose of clarity we will explain our results as if these cells have a simple arrangement in space (e.g., a 1D line or a 2D square lattice) but, as we will discuss, our results shall apply with the same logic to any distribution of cells in space." (Main text l.145-l.148).

      We believe that the confusion in this statement arises from the ambiguous use of the phrase “our results”. We will revise the text to provide a more precise description. Specifically, by “our results,” we refer to the conclusion that it is possible to determine whether a gene network leads to nontrivial pattern transformations based solely on its topology. This conclusion is independent of the dimensionality of space, as none of our arguments rely on assumptions specific to spatial dimensions. While one-dimensional examples are used for clarity and illustration, the underlying reasoning applies generally. In an improved version of the article, we will clarify this point explicitly and move relevant arguments from the Supplementary Information into the main text.

      Critically, our classification of gene networks is ultimately based on an argument concerning the dispersion relation associated with the network, and the construction of this dispersion relation is independent of the spatial dimensionality of the domain. In this sense, the networks identified in the text as capable of producing pattern transformations will be able to generate non-trivial pattern transformations in any spatial domain and in any number of dimensions. While the specific parameter values that permit such transformations may vary depending on the geometry and dimensionality of the domain, the existence of at least one such parameter set remains unaffected.

      The geometry of the domain can influence the specific form of the resulting patterns, but it does not alter the broader class of patterns (e.g., periodic patterns, peaks emerging around a spike, etc.) that a given gene network topology can produce. One such geometric influence, commonly observed in simulations, involves boundary effects. For example, structures such as peaks or rings forming near the boundaries may appear higher, broader, or spatially shifted compared to those arising in the central regions of the domain. However, we think a pattern consisting of a periodic train of peaks where only those near the boundary are slightly different can still be classified as a periodic pattern.

      "For any non-trivial pattern transformation (as long as it is symmetric around the initial spike), there exists an H gene network capable of producing it from a spike initial pattern." (Main text l.366f).

      A justification for this statement is provided shortly after the claim, although we acknowledge that the current explanation is somewhat cumbersome and would benefit from a clearer presentation in a revised version of the main text.

      A more detailed justification is provided in the Supplementary Information, based on three key ideas. First, any pattern (provided it is symmetric with respect to the initial spike) can be described as an arrangement of peaks with varying heights and spatial positions along a one-dimensional domain. Second, there exists a simple gene network—the diamond network—that, through parameter tuning, can produce two peaks of arbitrary height and symmetric position relative to the initial spike. Third, by placing multiple diamond networks positively upstream of a common gene product, that gene product can express peaks at each location where the upstream diamond networks induce them. Under mild additional conditions, this mechanism allows the formation of essentially any symmetric pattern. These mild conditions, along with a detailed analysis of the diamond network’s ability to generate peaks with controllable height and position, are discussed in the Supplementary Information.

      "In 2D there are no peaks but concentric rings of high gene product concentration centered around the spike, while in 3D there are concentric spherical shells." (Main text l. 447ff).

      This result pertains specifically to pattern transformations arising from spike initial patterns. As defined in the text, spike initial patterns are radially symmetric. Since diffusion preserves radial symmetry, pattern transformations from spike initial patterns in two or three dimensions reduce to effectively one-dimensional transformations along each radial direction. In this framework, each pair of concentration peaks symmetric with respect to the spike in one dimension corresponds to a ring surrounding the spike in two dimensions, and each ring in two dimensions becomes a hollow spherical shell around the spike in three dimensions.

      We agree that including a brief section in the Supplementary Information to clarify these subtleties would be helpful for readers to better understand the generalization of certain patterns to higher dimensions.

      (4) The study identifies one-signal networks and examines how combinations of these structures can give rise to minimal pattern-forming subnetworks. However, the analysis of the combinations of these minimal pattern-forming subnetworks remains relatively brief, and the manuscript does not explore how the results might change if the subnetworks were combined in upstream and downstream configurations. In our view, it is not evident that all possible gene regulatory networks can be fully characterized by these categories, nor that the resulting patterns can be reliably predicted. Rather, the approach appears more suited to identifying which known subnetworks are present within a larger network, without necessarily capturing the full dynamics of more complex configurations.

      We acknowledge that our explanation regarding the combination of sub-networks was relatively brief, and we intend to address this in a revised version. Our argument that combining sub-networks does not produce qualitatively new types of pattern transformations -beyond those already described- is based on the dispersion relation. Although this relation was only detailed in the Supplementary Information, it is central to our argument and will therefore be moved to the main text. Below, we provide an outline of this argument:

      Our study identifies two distinct behaviors of the principal branch of the dispersion relation at large wavenumbers. Based on this, gene networks capable of pattern formation can be classified into two categories: networks of the first kind, where the real part of the principal branch diverges to infinity as the wavenumber increases; and networks of the second kind, where the real part of the principal branch converges to a positive finite value for large wavenumbers. Naturally this argument applies to any gene network irrespectively of which, or how many, sub-networks are used to built it.

      Any gene regulatory network capable of pattern formation falls into one of these two categories. We identified that networks of the first kind contain at least one Turing sub-network, whereas networks of the second kind include either an H sub-network or a noise-amplifying sub-network. In this way, the primary objective of our study -namely, achieving a topological classification of gene regulatory networks capable of pattern formation- is fulfilled. It is important to note that while the dispersion relation provides broad information about the possible resulting patterns a gene network topology can produce (e.g., periodic versus noisy), it does not specify the exact patterns that emerge for each particular set of parameter values.

      Finally, regarding the shape of the resulting patterns, Figure S10 in the Supplementary Information exemplifies the notion that the behavior of combined networks can be understood as a combination of the individual behaviors of each constituent sub-network (note that the contribution of each type of sub-network in the resulting pattern is readily distinguishable). Consequently, we focus our detailed analysis on the patterning properties of the fundamental classes.

      (6) The manuscript lacks a clear and detailed explanation of the underlying model and its assumptions. In particular, it is not well-defined what constitutes a "cell" in the context of the model, nor is it justified why spatial features of cells -such as their size or boundaries- can be neglected. Furthermore, the concept of the extracellular space in the one-dimensional model remains ambiguous, making it unclear which gene products are assumed to diffuse.

      The size of cells is ignored in our model because we assume that they are small enough with respect to the total size of the domain that the space continuous reaction-diffusion equation (equation (1) in the main text) holds. Conceptually, one could understand cells in our model each of the pieces in an even partition of the domain into small subdomains surrounding each position x. This is anyway the standard procedure in most models of pattern formation by reaction-diffusion in embryonic development.

      For extracellular signals, we assume that g(t ,x) corresponds to the concentration of the signal in the extracellular space surrounding the cell located at position x. The extracellular space is any fluid medium for which Fick Laws apply and, therfore, the Fickian diffusion term in equation (1) is valid.

      For intracellular gene products, we assume that g(t ,x) corresponds to the concentration of such gene product within the cell at position x (if the gene product in hand is a transcription factor, for example), or on its surface (if it is a membrane-bound receptor). When collapsed in the continuous equations there is not such difference between being strictly within the cell or on its boundary. The only important fact is that these gene products cannot diffuse.

      Regarding cell boundaries, let us consider an extracellular signal s that regulates a transcriptor factor i within cells (in our model, i is an intracellular gene product). Such regulation shall be mediated by a membrane-bound receptor, which corresponds to intracellular gene product j. In terms of the gene regulatory network this is sji. Cell boundary effects mentioned by the reviewer should be encapsulated in the specific functional form of the regulation function f(g), but they have no effect in the actual topology of the network. Consequently, they are out of the scope of this study: as we mentioned before, considering different non-linear terms for f(g) will affect the parameter range for which a gene network is capable of producing non-trivial pattern transformations, but not their overall ability to produce non-trivial pattern transformations (i.e., the existence of at least one choice of model parameters for which such transformations take place).

      Finally, we would like to once again express our sincere gratitude to all reviewers for their insightful and constructive feedback. We are confident that the thorough peer review process will significantly enhance both the clarity and depth of our work. We greatly value the detailed comments provided and will carefully incorporate them in the preparation of a revised manuscript, which we intend to submit in the coming months.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Given knowledge of the amino acid sequence and of some version of the 3D structure of two monomers that are expected to form a complex, the authors investigate whether it is possible to accurately predict which residues will be in contact in the 3D structure of the expected complex. To this effect, they train a deep learning model that takes as inputs the geometric structures of the individual monomers, per-residue features (PSSMs) extracted from MSAs for each monomer, and rich representations of the amino acid sequences computed with the pre-trained protein language models ESM-1b, MSA Transformer, and ESM-IF. Predicting inter-protein contacts in complexes is an important problem. Multimer variants of AlphaFold, such as AlphaFold-Multimer, are the current state of the art for full protein complex structure prediction, and if the three-dimensional structure of a complex can be accurately predicted then the inter-protein contacts can also be accurately determined. By contrast, the method presented here seeks state-of-the-art performance among models that have been trained end-to-end for inter-protein contact prediction.

      Strengths:

      The paper is carefully written and the method is very well detailed. The model works both for homodimers and heterodimers. The ablation studies convincingly demonstrate that the chosen model architecture is appropriate for the task. Various comparisons suggest that PLMGraph-Inter performs substantially better, given the same input than DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter. As a byproduct of the analysis, a potentially useful heuristic criterion for acceptable contact prediction quality is found by the authors: namely, to have at least 50% precision in the prediction of the top 50 contacts.

      We thank the reviewer for recognizing the strengths of our work!

      Weaknesses:

      My biggest issue with this work is the evaluations made using bound monomer structures as inputs, coming from the very complexes to be predicted. Conformational changes in protein-protein association are the key element of the binding mechanism and are challenging to predict. While the GLINTER paper (Xie & Xu, 2022) is guilty of the same sin, the authors of CDPred (Guo et al., 2022) correctly only report test results obtained using predicted unbound tertiary structures as inputs to their model. Test results using experimental monomer structures in bound states can hide important limitations in the model, and thus say very little about the realistic use cases in which only the unbound structures (experimental or predicted) are available. I therefore strongly suggest reducing the importance given to the results obtained using bound structures and emphasizing instead those obtained using predicted monomer structures as inputs.

      We thank the reviewer for the suggestion! We evaluated PLMGraph-Inter with the predicted monomers and analyzed the result in details (see the “Impact of the monomeric structure quality on contact prediction” section and Figure 3). To mimic the real cases, we even deliberately reduced the performance of AF2 by using reduced MSAs (see the 2nd paragraph in the ““Impact of the monomeric structure quality on contact prediction” section). We leave some of the results in the supplementary of the current manuscript (Table S2). We will move these results to the main text to emphasize the performance of PLMGraph-Inter with the predicted monomers in the revision.

      In particular, the most relevant comparison with AlphaFold-Multimer (AFM) is given in Figure S2, not Figure 6. Unfortunately, it substantially shrinks the proportion of structures for which AFM fails while PLMGraph-Inter performs decently. Still, it would be interesting to investigate why this occurs. One possibility would be that the predicted monomer structures are of bad quality there, and PLMGraph-Inter may be able to rely on a signal from its language model features instead. Finally, AFM multimer confidence values ("iptm + ptm") should be provided, especially in the cases in which AFM struggles.

      We thank the reviewer for the suggestion! Yes! The performance of PLMGraph-Inter drops when the predicted monomers are used in the prediction. However, it is difficult to say which is a fairer comparison, Figure 6 or Figure S2, since AFM also searched monomer templates (see the third paragraph in 7. Supplementary Information : 7.1 Data in the AlphaFold-Multimer preprint: https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2.full) in the prediction. When we checked our AFM runs, we found that 99% of the targets in our study (including all the targets in the four datasets: HomoPDB, HeteroPDB, DHTest and DB5.5) employed at least 20 templates in their predictions, and 87.8% of the targets employed the native templates. We will provide the AFM confidence values of the AFM predictions in the revision.

      Besides, in cases where any experimental structures - bound or unbound - are available and given to PLMGraph-Inter as inputs, they should also be provided to AlphaFold-Multimer (AFM) as templates. Withholding these from AFM only makes the comparison artificially unfair. Hence, a new test should be run using AFM templates, and a new version of Figure 6 should be produced. Additionally, AFM's mean precision, at least for top-50 contact prediction, should be reported so it can be compared with PLMGraph-Inter's.

      We thank the reviewers for the suggestion! We would like to notify that AFM also searched monomer templates (see the third paragraph in 7. Supplementary Information : 7.1 Data in the AlphaFold-Multimer preprint: https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2.full) in the prediction. When we checked our AFM runs, we found that 99% of the targets in our study (including all the targets in the four datasets: HomoPDB, HeteroPDB, DHTest and DB5.5) employed at least 20 templates in their predictions, and 87.8% of the targets employed the native template.

      It's a shame that many of the structures used in the comparison with AFM are actually in the AFM v2 training set. If there are any outside the AFM v2 training set and, ideally, not sequence- or structure-homologous to anything in the AFM v2 training set, they should be discussed and reported on separately. In addition, why not test on structures from the "Benchmark 2" or "Recent-PDB-Multimers" datasets used in the AFM paper?

      We thank the reviewer for the suggestion! The biggest challenge to objectively evaluate AFM is that as far as we known, AFM does not release the PDB ids of its training set and the “Recent-PDB-Multimers” dataset. “Benchmark 2” only includes 17 heterodimer proteins, and the number can be further decreased after removing targets redundant to our training set. We think it is difficult to draw conclusions from such a small number of targets. In the revision, we will analyze the performance of AFM on targets released after the date cutoff of the AFM training set, but with which we cannot totally remove the redundancy between the training and the test sets of AFM.

      It is also worth noting that the AFM v2 weights have now been outdated for a while, and better v3 weights now exist, with a training cutoff of 2021-09-30.

      We thank the reviewer for reminding the new version of AFM. The only difference between AFM V3 and V2 is the cutoff date of the training set. Our test set would have more overlaps with the training set of AFM V3, which is one reason that we think AFM V2 is more appropriate to be used in the comparison.

      Another weakness in the evaluation framework: because PLMGraph-Inter uses structural inputs, it is not sufficient to make its test set non-redundant in sequence to its training set. It must also be non-redundant in structure. The Benchmark 2 dataset mentioned above is an example of a test set constructed by removing structures with homologous templates in the AF2 training set. Something similar should be done here.

      We agree with the reviewer that testing whether the model can keep its performance on targets with no templates (i.e. non-redundant in structure) is important. We will perform the analysis in the revision.

      Finally, the performance of DRN-1D2D for top-50 precision reported in Table 1 suggests to me that, in an ablation study, language model features alone would yield better performance than geometric features alone. So, I am puzzled why model "a" in the ablation is a "geometry-only" model and not a "LM-only" one.

      Using the protein geometric graph to integrate multiple protein language models is the main idea of PLMGraph-Inter. Comparing with our previous work (DRN-1D2D_Inter), we consider the building of the geometric graph as one major contribution of this work. To emphasize the efficacy of this geometric graph, we chose to use the “geometry-only” model as the base model. We will further clarity this in the revision.

      Reviewer #2 (Public Review):

      This work introduces PLMGraph-Inter, a new deep-learning approach for predicting inter-protein contacts, which is crucial for understanding protein-protein interactions. Despite advancements in this field, especially driven by AlphaFold, prediction accuracy and efficiency in terms of computational cost) still remains an area for improvement. PLMGraph-Inter utilizes invariant geometric graphs to integrate the features from multiple protein language models into the structural information of each subunit. When compared against other inter-protein contact prediction methods, PLMGraph-Inter shows better performance which indicates that utilizing both sequence embeddings and structural embeddings is important to achieve high-accuracy predictions with relatively smaller computational costs for the model training.

      The conclusions of this paper are mostly well supported by data, but test examples should be revisited with a more strict sequence identity cutoff to avoid any potential information leakage from the training data. The main figures should be improved to make them easier to understand.

      We thank the reviewer for recognizing the significance of our work! We will revise the manuscript carefully to address the reviewer’s concerns.

      1. The sequence identity cutoff to remove redundancies between training and test set was set to 40%, which is a bit high to remove test examples having homology to training examples. For example, CDPred uses a sequence identity cutoff of 30% to strictly remove redundancies between training and test set examples. To make their results more solid, the authors should have curated test examples with lower sequence identity cutoffs, or have provided the performance changes against sequence identities to the closest training examples.

      We thank the reviewer for the valuable suggestion! Using different thresholds to reduce the redundancy between the test set and the training set is a very good suggestion, and we will perform the analysis in the revision. In the current version of the manuscript, the 40% sequence identity is used as the cutoff for many previous studies used this cutoff (e.g. the Recent-PDB-Multimers used in AlphaFold-Multimer (see: 7.8 Datasets in the AlphaFold-Multimer paper); the work of DSCRIPT: https://www.cell.com/action/showPdf?pii=S2405-4712%2821%2900333-1 (see: the PPI dataset paragraph in the METHODS DETAILS section of the STAR METHODS)). One reason for using the relatively higher threshold for PPI studies is that PPIs are generally not as conserved as protein monomers.

      We performed a preliminary analysis using different thresholds to remove redundancy when preparing this provisional response letter:

      Author response table 1.

      Table1. The performance of PLMGraph-Inter on the HomoPDB and HeteroPDB test sets using native structures(AlphaFold2 predicted structures).

      Method:

      To remove redundancy, we clustered 11096 sequences from the training set and test sets (HomoPDB, HeteroPDB) using MMSeq2 with different sequence identity threshold (40%, 30%, 20%, 10%) (the lowest cutoff for CD-HIT is 40%, so we switched to MMSeq2). Each sequence is then uniquely labeled by the cluster (e.g. cluster 0, cluster 1, …) to which it belongs, from which each PPI can be marked with a pair of clusters (e.g. cluster 0-cluster 1). The PPIs belonging to the same cluster pair (note: cluster n - cluster m and cluster n-cluster m were considered as the same pair) were considered as redundant. For each PPI in the test set, if the pair cluster it belongs to contains the PPI belonging to the training set, we remove that PPI from the test set.

      We will perform more detailed analyses in the revised manuscript.

      1. Figures with head-to-head comparison scatter plots are hard to understand as scatter plots because too many different methods are abstracted into a single plot with multiple colors. It would be better to provide individual head-to-head scatter plots as supplementary figures, not in the main figure.

      We thank the reviewer for the suggestion! We will include the individual head-to-head scatter plots as supplementary figures in the revision.

      3) The authors claim that PLMGraph-Inter is complementary to AlphaFold-multimer as it shows better precision for the cases where AlphaFold-multimer fails. To strengthen the point, the qualities of predicted complex structures via protein-protein docking with predicted contacts as restraints should have been compared to those of AlphaFold-multimer structures.

      We thank the reviewer for the suggestion! We will add this comparison in the revision.

      4) It would be interesting to further analyze whether there is a difference in prediction performance depending on the depth of multiple sequence alignment or the type of complex (antigen-antibody, enzyme-substrates, single species PPI, multiple species PPI, etc).

      We thank the reviewer for the suggestion! We will perform such analysis in the revision.

    1. Author response:

      eLife Assessment 

      This valuable study investigates how the neural representation of individual finger movements changes during the early period of sequence learning. By combining a new method for extracting features from human magnetoencephalography data and decoding analyses, the authors provide incomplete evidence of an early, swift change in the brain regions correlated with sequence learning, including a set of previously unreported frontal cortical regions. The addition of more control analyses to rule out that head movement artefacts influence the findings, and to further explain the proposal of offline contextualization during short rest periods as the basis for improvement performance would strengthen the manuscript. 

      We appreciate the Editorial assessment on our paper’s strengths and novelty.  We have implemented additional control analyses to show that neither task-related eye movements nor increasing overlap of finger movements during learning account for our findings, which are that contextualized neural representations in a network of bilateral frontoparietal brain regions actively contribute to skill learning.  Importantly, we carried out additional analyses showing that contextualization develops predominantly during rest intervals.

      Public Reviews:

      We thank the Reviewers for their comments and suggestions, prompting new analyses and additions that strengthened our report.

      Reviewer #1 (Public review): 

      Summary: 

      This study addresses the issue of rapid skill learning and whether individual sequence elements (here: finger presses) are differentially represented in human MEG data. The authors use a decoding approach to classify individual finger elements and accomplish an accuracy of around 94%. A relevant finding is that the neural representations of individual finger elements dynamically change over the course of learning. This would be highly relevant for any attempts to develop better brain machine interfaces - one now can decode individual elements within a sequence with high precision, but these representations are not static but develop over the course of learning. 

      Strengths: The work follows a large body of work from the same group on the behavioural and neural foundations of sequence learning. The behavioural task is well established and neatly designed to allow for tracking learning and how individual sequence elements contribute. The inclusion of short offline rest periods between learning epochs has been influential because it has revealed that a lot, if not most of the gains in behaviour (ie speed of finger movements) occur in these so-called micro-offline rest periods. The authors use a range of new decoding techniques, and exhaustively interrogate their data in different ways, using different decoding approaches. Regardless of the approach, impressively high decoding accuracies are observed, but when using a hybrid approach that combines the MEG data in different ways, the authors observe decoding accuracies of individual sequence elements from the MEG data of up to 94%. 

      We have previously showed that neural replay of MEG activity representing the practiced skill correlated with micro-offline gains during rest intervals of early learning, 1 consistent with the recent report that hippocampal ripples during these offline periods predict human motor sequence learning2.  However, decoding accuracy in our earlier work1 needed improvement.  Here, we reported a strategy to improve decoding accuracy that could benefit future studies of neural replay or BCI using MEG.

      Weaknesses: 

      There are a few concerns which the authors may well be able to resolve. These are not weaknesses as such, but factors that would be helpful to address as these concern potential contributions to the results that one would like to rule out. Regarding the decoding results shown in Figure 2 etc, a concern is that within individual frequency bands, the highest accuracy seems to be within frequencies that match the rate of keypresses. This is a general concern when relating movement to brain activity, so is not specific to decoding as done here. As far as reported, there was no specific restraint to the arm or shoulder, and even then it is conceivable that small head movements would correlate highly with the vigor of individual finger movements. This concern is supported by the highest contribution in decoding accuracy being in middle frontal regions - midline structures that would be specifically sensitive to movement artefacts and don't seem to come to mind as key structures for very simple sequential keypress tasks such as this - and the overall pattern is remarkably symmetrical (despite being a unimanual finger task) and spatially broad. This issue may well be matching the time course of learning, as the vigor and speed of finger presses will also influence the degree to which the arm/shoulder and head move. This is not to say that useful information is contained within either of the frequencies or broadband data. But it raises the question of whether a lot is dominated by movement "artefacts" and one may get a more specific answer if removing any such contributions. 

      Reviewer #1 expresses concern that the combination of the low-frequency narrow-band decoder results, and the bilateral middle frontal regions displaying the highest average intra-parcel decoding performance across subjects is suggestive that the decoding results could be driven by head movement or other artefacts.

      Head movement artefacts are highly unlikely to contribute meaningfully to our results for the following reasons. First, in addition to ICA denoising, all “recordings were visually inspected and marked to denoise segments containing other large amplitude artifacts due to movements” (see Methods). Second, the response pad was positioned in a manner that minimized wrist, arm or more proximal body movements during the task. Third, while head position was not monitored online for this study, the head was restrained using an inflatable air bladder, and head position was assessed at the beginning and at the end of each recording. Head movement did not exceed 5mm between the beginning and end of each scan for all participants included in the study. Fourth, we agree that despite the steps taken above, it is possible that minor head movements could still contribute to some remaining variance in the MEG data in our study. The Reviewer states a concern that “it is conceivable that small head movements would correlate highly with the vigor of individual finger movements”. However, in order for any such correlations to meaningfully impact decoding performance, such head movements would need to: (A) be consistent and pervasive throughout the recording (which might not be the case if the head movements were related to movement vigor and vigor changed over time); and (B) systematically vary between different finger movements, and also between the same finger movement performed at different sequence locations (see 5-class decoding performance in Figure 4B). The possibility of any head movement artefacts meeting all these conditions is extremely unlikely.

      Given the task design, a much more likely confound in our estimation would be the contribution of eye movement artefacts to the decoder performance (an issue appropriately raised by Reviewer #3 in the comments below). Remember from Figure 1A in the manuscript that an asterisk marks the current position in the sequence and is updated at each keypress. Since participants make very few performance errors, the position of the asterisk on the display is highly correlated with the keypress being made in the sequence. Thus, it is possible that if participants are attending to the visual feedback provided on the display, they may move their eyes in a way that is systematically related to the task.  Since we did record eye movements simultaneously with the MEG recordings (EyeLink 1000 Plus; Fs = 600 Hz), we were able to perform a control analysis to address this question. For each keypress event during trials in which no errors occurred (which is the same time-point that the asterisk position is updated), we extracted three features related to eye movements: 1) the gaze position at the time of asterisk position update (or keyDown event), 2) the gaze position 150ms later, and 3) the peak velocity of the eye movement between the two positions. We then constructed a classifier from these features with the aim of predicting the location of the asterisk (ordinal positions 1-5) on the display. As shown in the confusion matrix below (Author response image 1), the classifier failed to perform above chance levels (Overall cross-validated accuracy = 0.21817):

      Author response image 1.

      Confusion matrix showing that three eye movement features fail to predict asterisk position on the task display above chance levels (Fold 1 test accuracy = 0.21718; Fold 2 test accuracy = 0.22023; Fold 3 test accuracy = 0.21859; Fold 4 test accuracy = 0.22113; Fold 5 test accuracy = 0.21373; Overall cross-validated accuracy = 0.2181). Since the ordinal position of the asterisk on the display is highly correlated with the ordinal position of individual keypresses in the sequence, this analysis provides strong evidence that keypress decoding performance from MEG features is not explained by systematic relationships between finger movement behavior and eye movements (i.e. – behavioral artefacts).

      In fact, inspection of the eye position data revealed that a majority of participants on most trials displayed random walk gaze patterns around a center fixation point, indicating that participants did not attend to the asterisk position on the display. This is consistent with intrinsic generation of the action sequence, and congruent with the fact that the display does not provide explicit feedback related to performance. A similar real-world example would be manually inputting a long password into a secure online application. In this case, one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks), which is typically ignored by the user. The minimal participant engagement with the visual task display observed in this study highlights another important point – that the behavior in explicit sequence learning motor tasks is highly generative in nature rather than reactive to stimulus cues as in the serial reaction time task (SRTT).  This is a crucial difference that must be carefully considered when designing investigations and comparing findings across studies.

      We observed that initial keypress decoding accuracy was predominantly driven by contralateral primary sensorimotor cortex in the initial practice trials before transitioning to bilateral frontoparietal regions by trials 11 or 12 as performance gains plateaued.  The contribution of contralateral primary sensorimotor areas to early skill learning has been extensively reported in humans and non-human animals. 1,3-5  Similarly, the increased involvement of bilateral frontal and parietal regions to decoding during early skill learning in the non-dominant hand is well known.  Enhanced bilateral activation in both frontal and parietal cortex during skill learning has been extensively reported6-11, and appears to be even more prominent during early fine motor skill learning in the non-dominant hand12,13.  The frontal regions identified in these studies are known to play crucial roles in executive control14, motor planning15, and working memory6,8,16-18 processes, while the same parietal regions are known to integrate multimodal sensory feedback and support visuomotor transformations6,8,16-18, in addition to working memory19. Thus, it is not surprising that these regions increasingly contribute to decoding as subjects internalize the sequential task.  We now include a statement reflecting these considerations in the revised Discussion.

      A somewhat related point is this: when combining voxel and parcel space, a concern is whether a degree of circularity may have contributed to the improved accuracy of the combined data, because it seems to use the same MEG signals twice - the voxels most contributing are also those contributing most to a parcel being identified as relevant, as parcels reflect the average of voxels within a boundary. In this context, I struggled to understand the explanation given, ie that the improved accuracy of the hybrid model may be due to "lower spatially resolved whole-brain and higher spatially resolved regional activity patterns".

      We strongly disagree with the Reviewer’s assertion that the construction of the hybrid-space decoder is circular. To clarify, the base feature set for the hybrid-space decoder constructed for all participants includes whole-brain spatial patterns of MEG source activity averaged within parcels. As stated in the manuscript, these 148 inter-parcel features reflect “lower spatially resolved whole-brain activity patterns” or global brain dynamics. We then independently test how well spatial patterns of MEG source activity for all voxels distributed within individual parcels can decode keypress actions. Again, the testing of these intra-parcel spatial patterns, intended to capture “higher spatially resolved regional brain activity patterns”, is completely independent from one another and independent from the weighting of individual inter-parcel features. These intra-parcel features could, for example, provide additional information about muscle activation patterns or the task environment. These approximately 1150 intra-parcel voxels (on average, within the total number varying between subjects) are then combined with the 148 inter-parcel features to construct the final hybrid-space decoder. In fact, this varied spatial filter approach shares some similarities to the construction of convolutional neural networks (CNNs) used to perform object recognition in image classification applications. One could also view this hybrid-space decoding approach as a spatial analogue to common time-frequency based analyses such as theta-gamma phase amplitude coupling (PAC), which combine information from two or more narrow-band spectral features derived from the same time-series data.

      We directly tested this hypothesis – that spatially overlapping intra- and inter-parcel features portray different information – by constructing an alternative hybrid-space decoder (HybridAlt) that excluded average inter-parcel features which spatially overlapped with intra-parcel voxel features, and comparing the performance to the decoder used in the manuscript (HybridOrig). The prediction was that if the overlapping parcel contained similar information to the more spatially resolved voxel patterns, then removing the parcel features (n=8) from the decoding analysis should not impact performance. In fact, despite making up less than 1% of the overall input feature space, removing those parcels resulted in a significant drop in overall performance greater than 2% (78.15% ± SD 7.03% for HybridOrig vs. 75.49% ± SD 7.17% for HybridAlt; Wilcoxon signed rank test, z = 3.7410, p = 1.8326e-04) (Author response image 2).

      Author response image 2.

      Comparison of decoding performances with two different hybrid approaches. HybridAlt: Intra-parcel voxel-space features of top ranked parcels and inter-parcel features of remaining parcels. HybridOrig:  Voxel-space features of top ranked parcels and whole-brain parcel-space features (i.e. – the version used in the manuscript). Dots represent decoding accuracy for individual subjects. Dashed lines indicate the trend in performance change across participants. Note, that HybridOrig (the approach used in our manuscript) significantly outperforms the HybridAlt approach, indicating that the excluded parcel features provide unique information compared to the spatially overlapping intra-parcel voxel patterns.

      Firstly, there will be a relatively high degree of spatial contiguity among voxels because of the nature of the signal measured, i.e. nearby individual voxels are unlikely to be independent. Secondly, the voxel data gives a somewhat misleading sense of precision; the inversion can be set up to give an estimate for each voxel, but there will not just be dependence among adjacent voxels, but also substantial variation in the sensitivity and confidence with which activity can be projected to different parts of the brain. Midline and deeper structures come to mind, where the inversion will be more problematic than for regions along the dorsal convexity of the brain, and a concern is that in those midline structures, the highest decoding accuracy is seen. 

      We definitely agree with the Reviewer that some inter-parcel features representing neighboring (or spatially contiguous) voxels are likely to be correlated. This has been well documented in the MEG literature20,21 and is a particularly important confound to address in functional or effective connectivity analyses (not performed in the present study). In the present analysis, any correlation between adjacent voxels presents a multi-collinearity problem, which effectively reduces the dimensionality of the input feature space. However, as long as there are multiple groups of correlated voxels within each parcel (i.e. - the effective dimensionality is still greater than 1), the intra-parcel spatial patterns could still meaningfully contribute to the decoder performance. Two specific results support this assertion.

      First, we obtained higher decoding accuracy with voxel-space features [74.51% (± SD 7.34%)] compared to parcel space features [68.77% (± SD 7.6%)] (Figure 3B), indicating individual voxels carry more information in decoding the keypresses than the averaged voxel-space features or parcel-space features.  Second, Individual voxels within a parcel showed varying feature importance scores in decoding keypresses (Author response image 3). This finding supports the Reviewer’s assertion that neighboring voxels express similar information, but also shows that the correlated voxels form mini subclusters that are much smaller spatially than the parcel they reside in.

      Author response image 3.

      Feature importance score of individual voxels in decoding keypresses: MRMR was used to rank the individual voxel space features in decoding keypresses and the min-max normalized MRMR score was mapped to a structural brain surface. Note that individual voxels within a parcel showed different contribution to decoding.

       

      Some of these concerns could be addressed by recording head movement (with enough precision) to regress out these contributions. The authors state that head movement was monitored with 3 fiducials, and their time courses ought to provide a way to deal with this issue. The ICA procedure may not have sufficiently dealt with removing movement-related problems, but one could eg relate individual components that were identified to the keypresses as another means for checking. An alternative could be to focus on frequency ranges above the movement frequencies. The accuracy for those still seems impressive and may provide a slightly more biologically plausible assessment. 

      We have already addressed the issue of movement related artefacts in the first response above. With respect to a focus on frequency ranges above movement frequencies, the Reviewer states the “accuracy for those still seems impressive and may provide a slightly more biologically plausible assessment”. First, it is important to note that cortical delta-band oscillations measured with local field potentials (LFPs) in macaques is known to contain important information related to end-effector kinematics22,23 muscle activation patterns24 and temporal sequencing25 during skilled reaching and grasping actions. Thus, there is a substantial body of evidence that low-frequency neural oscillatory activity in this range contains important information about the skill learning behavior investigated in the present study. Second, our own data shows (which the Reviewer also points out) that significant information related to the skill learning behavior is also present in higher frequency bands (see Figure 2A and Figure 3—figure supplement 1). As we pointed out in our earlier response to questions about the hybrid space decoder architecture (see above), it is likely that different, yet complimentary, information is encoded across different temporal frequencies (just as it is encoded across different spatial frequencies). Again, this interpretation is supported by our data as the highest performing classifiers in all cases (when holding all parameters constant) were always constructed from broadband input MEG data (Figure 2A and Figure 3—figure supplement 1).  

      One question concerns the interpretation of the results shown in Figure 4. They imply that during the course of learning, entirely different brain networks underpin the behaviour. Not only that, but they also include regions that would seem rather unexpected to be key nodes for learning and expressing relatively simple finger sequences, such as here. What then is the biological plausibility of these results? The authors seem to circumnavigate this issue by moving into a distance metric that captures the (neural network) changes over the course of learning, but the discussion seems detached from which regions are actually involved; or they offer a rather broad discussion of the anatomical regions identified here, eg in the context of LFOs, where they merely refer to "frontoparietal regions". 

      The Reviewer notes the shift in brain networks driving keypress decoding performance between trials 1, 11 and 36 as shown in Figure 4A. The Reviewer questions whether these substantial shifts in brain network states underpinning the skill are biologically plausible, as well as the likelihood that bilateral superior and middle frontal and parietal cortex are important nodes within these networks.

      First, previous fMRI work in humans performing a similar sequence learning task showed that flexibility in brain network composition (i.e. – changes in brain region members displaying coordinated activity) is up-regulated in novel learning environments and explains differences in learning rates across individuals26.  This work supports our interpretation of the present study data, that brain networks engaged in sequential motor skills rapidly reconfigure during early learning.

      Second, frontoparietal network activity is known to support motor memory encoding during early learning27,28. For example, reactivation events in the posterior parietal29 and medial prefrontal30,31 cortex (MPFC) have been temporally linked to hippocampal replay, and are posited to support memory consolidation across several memory domains32, including motor sequence learning1,33,34.  Further, synchronized interactions between MPFC and hippocampus are more prominent during early learning as opposed to later stages27,35,36, perhaps reflecting “redistribution of hippocampal memories to MPFC” 27.  MPFC contributes to very early memory formation by learning association between contexts, locations, events and adaptive responses during rapid learning37. Consistently, coupling between hippocampus and MPFC has been shown during, and importantly immediately following (rest) initial memory encoding38,39.  Importantly, MPFC activity during initial memory encoding predicts subsequent recall40. Thus, the spatial map required to encode a motor sequence memory may be “built under the supervision of the prefrontal cortex” 28, also engaged in the development of an abstract representation of the sequence41.  In more abstract terms, the prefrontal, premotor and parietal cortices support novice performance “by deploying attentional and control processes” 42-44 required during early learning42-44. The dorsolateral prefrontal cortex DLPFC specifically is thought to engage in goal selection and sequence monitoring during early skill practice45, all consistent with the schema model of declarative memory in which prefrontal cortices play an important role in encoding46,47.  Thus, several prefrontal and frontoparietal regions contributing to long term learning 48 are also engaged in early stages of encoding. Altogether, there is strong biological support for the involvement of bilateral prefrontal and frontoparietal regions to decoding during early skill learning.  We now address this issue in the revised manuscript.

      If I understand correctly, the offline neural representation analysis is in essence the comparison of the last keypress vs the first keypress of the next sequence. In that sense, the activity during offline rest periods is actually not considered. This makes the nomenclature somewhat confusing. While it matches the behavioural analysis, having only key presses one can't do it in any other way, but here the authors actually do have recordings of brain activity during offline rest. So at the very least calling it offline neural representation is misleading to this reviewer because what is compared is activity during the last and during the next keypress, not activity during offline periods. But it also seems a missed opportunity - the authors argue that most of the relevant learning occurs during offline rest periods, yet there is no attempt to actually test whether activity during this period can be useful for the questions at hand here. 

      We agree with the Reviewer that our previous “offline neural representation” nomenclature could be misinterpreted. In the revised manuscript we refer to this difference as the “offline neural representational change”. Please, note that our previous work did link offline neural activity (i.e. – 16-22 Hz beta power and neural replay density during inter-practice rest periods) to observed micro-offline gains49.

      Reviewer #2 (Public review): 

      Summary 

      Dash et al. asked whether and how the neural representation of individual finger movements is "contextualized" within a trained sequence during the very early period of sequential skill learning by using decoding of MEG signal. Specifically, they assessed whether/how the same finger presses (pressing index finger) embedded in the different ordinal positions of a practiced sequence (4-1-3-2-4; here, the numbers 1 through 4 correspond to the little through the index fingers of the non-dominant left hand) change their representation (MEG feature). They did this by computing either the decoding accuracy of the index finger at the ordinal positions 1 vs. 5 (index_OP1 vs index_OP5) or pattern distance between index_OP1 vs. index_OP5 at each training trial and found that both the decoding accuracy and the pattern distance progressively increase over the course of learning trials. More interestingly, they also computed the pattern distance for index_OP5 for the last execution of a practice trial vs. index_OP1 for the first execution in the next practice trial (i.e., across the rest period). This "off-line" distance was significantly larger than the "on-line" distance, which was computed within practice trials and predicted micro-offline skill gain. Based on these results, the authors conclude that the differentiation of representation for the identical movement embedded in different positions of a sequential skill ("contextualization") primarily occurs during early skill learning, especially during rest, consistent with the recent theory of the "micro-offline learning" proposed by the authors' group. I think this is an important and timely topic for the field of motor learning and beyond. <br /> Strengths 

      The specific strengths of the current work are as follows. First, the use of temporally rich neural information (MEG signal) has a large advantage over previous studies testing sequential representations using fMRI. This allowed the authors to examine the earliest period (= the first few minutes of training) of skill learning with finer temporal resolution. Second, through the optimization of MEG feature extraction, the current study achieved extremely high decoding accuracy (approx. 94%) compared to previous works. As claimed by the authors, this is one of the strengths of the paper (but see my comments). Third, although some potential refinement might be needed, comparing "online" and "offline" pattern distance is a neat idea. 

      Weaknesses 

      Along with the strengths I raised above, the paper has some weaknesses. First, the pursuit of high decoding accuracy, especially the choice of time points and window length (i.e., 200 msec window starting from 0 msec from key press onset), casts a shadow on the interpretation of the main result. Currently, it is unclear whether the decoding results simply reflect behavioral change or true underlying neural change. As shown in the behavioral data, the key press speed reached 3~4 presses per second already at around the end of the early learning period (11th trial), which means inter-press intervals become as short as 250-330 msec. Thus, in almost more than 60% of training period data, the time window for MEG feature extraction (200 msec) spans around 60% of the inter-press intervals. Considering that the preparation/cueing of subsequent presses starts ahead of the actual press (e.g., Kornysheva et al., 2019) and/or potential online planning (e.g., Ariani and Diedrichsen, 2019), the decoder likely has captured these future press information as well as the signal related to the current key press, independent of the formation of genuine sequential representation (e.g., "contextualization" of individual press). This may also explain the gradual increase in decoding accuracy or pattern distance between index_OP1 vs. index_OP5 (Figure 4C and 5A), which co-occurred with performance improvement, as shorter inter-press intervals are more favorable for the dissociating the two index finger presses followed by different finger presses. The compromised decoding accuracies for the control sequences can be explained in similar logic. Therefore, more careful consideration and elaborated discussion seem necessary when trying to both achieve high-performance decoding and assess early skill learning, as it can impact all the subsequent analyses.

      The Reviewer raises the possibility that (given the windowing parameters used in the present study) an increase in “contextualization” with learning could simply reflect faster typing speeds as opposed to an actual change in the underlying neural representation. The issue can essentially be framed as a mixing problem. As correct sequences are generated at higher and higher speeds over training, MEG activity patterns related to the planning, execution, evaluation and memory of individual keypresses overlap more in time. Thus, increased overlap between the “4” and “1” keypresses (at the start of the sequence) and “2” and “4” keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged (assuming this mixing of representations is used by the classifier to differentially tag each index finger press). If this were the case, it follows that such mixing effects reflecting the ordinal sequence structure would also be observable in the distribution of decoder misclassifications. For example, “4” keypresses would be more likely to be misclassified as “1” or “2” keypresses (or vice versa) than as “3” keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3—figure supplement 3A in the previously submitted manuscript do not show this trend in the distribution of misclassifications across the four fingers.

      Moreover, if the representation distance is largely driven by this mixing effect, it’s also possible that the increased overlap between consecutive index finger keypresses during the 4-4 transition marking the end of one sequence and the beginning of the next one could actually mask contextualization-related changes to the underlying neural representations and make them harder to detect. In this case, a decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position might show decreased performance with learning as adjacent keypresses overlapped in time with each other to an increasing extent. However, Figure 4C in our previously submitted manuscript does not support this possibility, as the 2-class hybrid classifier displays improved classification performance over early practice trials despite greater temporal overlap.

      We also conducted a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis affirmed that the possible alternative explanation put forward by the Reviewer is not supported by our data (Adjusted R2 = 0.00431; F = 5.62). We now include this new negative control analysis result in the revised manuscript.

      Overall, we do strongly agree with the Reviewer that the naturalistic, self-paced, generative task employed in the present study results in overlapping brain processes related to planning, execution, evaluation and memory of the action sequence. We also agree that there are several tradeoffs to consider in the construction of the classifiers depending on the study aim. Given our aim of optimizing keypress decoder accuracy in the present study, the set of trade-offs resulted in representations reflecting more the latter three processes, and less so the planning component. Whether separate decoders can be constructed to tease apart the representations or networks supporting these overlapping processes is an important future direction of research in this area. For example, work presently underway in our lab constrains the selection of windowing parameters in a manner that allows individual classifiers to be temporally linked to specific planning, execution, evaluation or memory-related processes to discern which brain networks are involved and how they adaptively reorganize with learning. Results from the present study (Figure 4—figure supplement 2) showing hybrid-space decoder prediction accuracies exceeding 74% for temporal windows spanning as little as 25ms and located up to 100ms prior to the keyDown event strongly support the feasibility of such an approach.

      Related to the above point, testing only one particular sequence (4-1-3-2-4), aside from the control ones, limits the generalizability of the finding. This also may have contributed to the extremely high decoding accuracy reported in the current study. 

      The Reviewer raises a question about the generalizability of the decoder accuracy reported in our study. Fortunately, a comparison between decoder performances on Day 1 and Day 2 datasets does provide some insight into this issue. As the Reviewer points out, the classifiers in this study were trained and tested on keypresses performed while practicing a specific sequence (4-1-3-2-4). The study was designed this way as to avoid the impact of interference effects on learning dynamics. The cross-validated performance of classifiers on MEG data collected within the same session was 90.47% overall accuracy (4-class; Figure 3C). We then tested classifier performance on data collected during a separate MEG session conducted approximately 24 hours later (Day 2; see Figure 3—supplement 3). We observed a reduction in overall accuracy rate to 87.11% when tested on MEG data recorded while participants performed the same learned sequence, and 79.44% when they performed several previously unpracticed sequences. Both changes in accuracy are important with regards to the generalizability of our findings. First, 87.11% performance accuracy for the trained sequence data on Day 2 (a reduction of only 3.36%) indicates that the hybrid-space decoder performance is robust over multiple MEG sessions, and thus, robust to variations in SNR across the MEG sensor array caused by small differences in head position between scans.  This indicates a substantial advantage over sensor-space decoding approaches. Furthermore, when tested on data from unpracticed sequences, overall performance dropped an additional 7.67%. This difference reflects the performance bias of the classifier for the trained sequence, possibly caused by high-order sequence structure being incorporated into the feature weights. In the future, it will be important to understand in more detail how random or repeated keypress sequence training data impacts overall decoder performance and generalization. We strongly agree with the Reviewer that the issue of generalizability is extremely important and have added a new paragraph to the Discussion in the revised manuscript highlighting the strengths and weaknesses of our study with respect to this issue.

      In terms of clinical BCI, one of the potential relevance of the study, as claimed by the authors, it is not clear that the specific time window chosen in the current study (up to 200 msec since key press onset) is really useful. In most cases, clinical BCI would target neural signals with no overt movement execution due to patients' inability to move (e.g., Hochberg et al., 2012). Given the time window, the surprisingly high performance of the current decoder may result from sensory feedback and/or planning of subsequent movement, which may not always be available in the clinical BCI context. Of course, the decoding accuracy is still much higher than chance even when using signal before the key press (as shown in Figure 4 Supplement 2), but it is not immediately clear to me that the authors relate their high decoding accuracy based on post-movement signal to clinical BCI settings.

      The Reviewer questions the relevance of the specific window parameters used in the present study for clinical BCI applications, particularly for paretic patients who are unable to produce finger movements or for whom afferent sensory feedback is no longer intact. We strongly agree with the Reviewer that any intended clinical application must carefully consider these specific input feature constraints dictated by the clinical cohort, and in turn impose appropriate and complimentary constraints on classifier parameters that may differ from the ones used in the present study.  We now highlight this issue in the Discussion of the revised manuscript and relate our present findings to published clinical BCI work within this context.

      One of the important and fascinating claims of the current study is that the "contextualization" of individual finger movements in a trained sequence specifically occurs during short rest periods in very early skill learning, echoing the recent theory of micro-offline learning proposed by the authors' group. Here, I think two points need to be clarified. First, the concept of "contextualization" is kept somewhat blurry throughout the text. It is only at the later part of the Discussion (around line #330 on page 13) that some potential mechanism for the "contextualization" is provided as "what-and-where" binding. Still, it is unclear what "contextualization" actually is in the current data, as the MEG signal analyzed is extracted from 0-200 msec after the keypress. If one thinks something is contextualizing an action, that contextualization should come earlier than the action itself. 

      The Reviewer requests that we: 1) more clearly define our use of the term “contextualization” and 2) provide the rationale for assessing it over a 200ms window aligned to the keyDown event. This choice of window parameters means that the MEG activity used in our analysis was coincident with, rather than preceding, the actual keypresses.  We define contextualization as the differentiation of representation for the identical movement embedded in different positions of a sequential skill. That is, representations of individual action elements progressively incorporate information about their relationship to the overall sequence structure as the skill is learned. We agree with the Reviewer that this can be appropriately interpreted as “what-and-where” binding. We now incorporate this definition in the Introduction of the revised manuscript as requested.

      The window parameters for optimizing accurate decoding individual finger movements were determined using a grid search of the parameter space (a sliding window of variable width between 25-350 ms with 25 ms increments variably aligned from 0 to +100ms with 10ms increments relative to the keyDown event). This approach generated 140 different temporal windows for each keypress for each participant, with the final parameter selection determined through comparison of the resulting performance between each decoder.  Importantly, the decision to optimize for decoding accuracy placed an emphasis on keypress representations characterized by the most consistent and robust features shared across subjects, which in turn maximize statistical power in detecting common learning-related changes. In this case, the optimal window encompassed a 200ms epoch aligned to the keyDown event (t0 = 0 ms).  We then asked if the representations (i.e. – spatial patterns of combined parcel- and voxel-space activity) of the same digit at two different sequence positions changed with practice within this optimal decoding window.  Of course, our findings do not rule out the possibility that contextualization can also be found before or even after this time window, as we did not directly address this issue in the present study.  Ongoing work in our lab, as pointed out above, is investigating contextualization within different time windows tailored specifically for assessing sequence skill action planning, execution, evaluation and memory processes.

      The second point is that the result provided by the authors is not yet convincing enough to support the claim that "contextualization" occurs during rest. In the original analysis, the authors presented the statistical significance regarding the correlation between the "offline" pattern differentiation and micro-offline skill gain (Figure 5. Supplement 1), as well as the larger "offline" distance than "online" distance (Figure 5B). However, this analysis looks like regressing two variables (monotonically) increasing as a function of the trial. Although some information in this analysis, such as what the independent/dependent variables were or how individual subjects were treated, was missing in the Methods, getting a statistically significant slope seems unsurprising in such a situation. Also, curiously, the same quantitative evidence was not provided for its "online" counterpart, and the authors only briefly mentioned in the text that there was no significant correlation between them. It may be true looking at the data in Figure 5A as the online representation distance looks less monotonically changing, but the classification accuracy presented in Figure 4C, which should reflect similar representational distance, shows a more monotonic increase up to the 11th trial. Further, the ways the "online" and "offline" representation distance was estimated seem to make them not directly comparable. While the "online" distance was computed using all the correct press data within each 10 sec of execution, the "offline" distance is basically computed by only two presses (i.e., the last index_OP5 vs. the first index_OP1 separated by 10 sec of rest). Theoretically, the distance between the neural activity patterns for temporally closer events tends to be closer than that between the patterns for temporally far-apart events. It would be fairer to use the distance between the first index_OP1 vs. the last index_OP5 within an execution period for "online" distance, as well. 

      The Reviewer suggests that the current data is not convincing enough to show that contextualization occurs during rest and raises two important concerns: 1) the relationship between online contextualization and micro-online gains is not shown, and 2) the online distance was calculated differently from its offline counterpart (i.e. - instead of calculating the distance between last IndexOP5 and first IndexOP1 from a single trial, the distance was calculated for each sequence within a trial and then averaged).

      We addressed the first concern by performing individual subject correlations between 1) contextualization changes during rest intervals and micro-offline gains; 2) contextualization changes during practice trials and micro-online gains, and 3) contextualization changes during practice trials and micro-offline gains (Author response image 4). We then statistically compared the resulting correlation coefficient distributions and found that within-subject correlations for contextualization changes during rest intervals and micro-offline gains were significantly higher than online contextualization and micro-online gains (t = 3.2827, p = 0.0015) and online contextualization and micro-offline gains (t = 3.7021, p = 5.3013e-04). These results are consistent with our interpretation that micro-offline gains are supported by contextualization changes during the inter-practice rest period.

      Author response image 4.

      Distribution of individual subject correlation coefficients between contextualization changes occurring during practice or rest with  micro-online and micro-offline performance gains. Note that, the correlation distributions were significantly higher for the relationship between contextualization changes during rest and micro-offline gains than for contextualization changes during practice and either micro-online or offline gain.

      With respect to the second concern highlighted above, we agree with the Reviewer that one limitation of the analysis comparing online versus offline changes in contextualization as presented in the reviewed manuscript, is that it does not eliminate the possibility that any differences could simply be explained by the passage of time (which is smaller for the online analysis compared to the offline analysis). The Reviewer suggests an approach that addresses this issue, which we have now carried out.   When quantifying online changes in contextualization from the first IndexOP1 the last IndexOP5 keypress in the same trial we observed no learning-related trend (Author response image 5, right panel). Importantly, offline distances were significantly larger than online distances regardless of the measurement approach and neither predicted online learning (Author response image 6).

      Author response image 5.

      Trial by trial trend of offline (left panel) and online (middle and right panels) changes in contextualization. Offline changes in contextualization were assessed by calculating the distance between neural representations for the last IndexOP5 keypress in the previous trial and the first IndexOP1 keypress in the present trial. Two different approaches were used to characterize online contextualization changes. The analysis included in the reviewed manuscript (middle panel) calculated the distance between IndexOP1 and IndexOP5 for each correct sequence, which was then averaged across the trial. This approach is limited by the lack of control for the passage of time when making online versus offline comparisons. Thus, the second approach controlled for the passage of time by calculating distance between the representations associated with the first IndexOP1 keypress and the last IndexOP5 keypress within the same trial. Note that while the first approach showed an increase online contextualization trend with practice, the second approach did not.

      Author response image 6.

      Relationship between online contextualization and online learning is shown for both within-sequence (left; note that this is the online contextualization measure used in the reviewd manuscript) and across-sequence (right) distance calculation. There was no significant relationship between online learning and online contextualization regardless of the measurement approach.

      A related concern regarding the control analysis, where individual values for max speed and the degree of online contextualization were compared (Figure 5 Supplement 3), is whether the individual difference is meaningful. If I understood correctly, the optimization of the decoding process (temporal window, feature inclusion/reduction, decoder, etc.) was performed for individual participants, and the same feature extraction was also employed for the analysis of representation distance (i.e., contextualization). If this is the case, the distances are individually differently calculated and they may need to be normalized relative to some stable reference (e.g., 1 vs. 4 or average distance within the control sequence presses) before comparison across the individuals. 

      The Reviewer makes a good point here. We have now implemented the suggested normalization procedure in the analysis provided in the revised manuscript.

      Reviewer #3 (Public review): 

      Summary: 

      One goal of this paper is to introduce a new approach for highly accurate decoding of finger movements from human magnetoencephalography data via dimension reduction of a "multi-scale, hybrid" feature space. Following this decoding approach, the authors aim to show that early skill learning involves "contextualization" of the neural coding of individual movements, relative to their position in a sequence of consecutive movements. Furthermore, they aim to show that this "contextualization" develops primarily during short rest periods interspersed with skill training and correlates with a performance metric which the authors interpret as an indicator of offline learning. <br /> Strengths: 

      A clear strength of the paper is the innovative decoding approach, which achieves impressive decoding accuracies via dimension reduction of a "multi-scale, hybrid space". This hybrid-space approach follows the neurobiologically plausible idea of the concurrent distribution of neural coding across local circuits as well as large-scale networks. A further strength of the study is the large number of tested dimension reduction techniques and classifiers (though the manuscript reveals little about the comparison of the latter). 

      We appreciate the Reviewer’s comments regarding the paper’s strengths.

      A simple control analysis based on shuffled class labels could lend further support to this complex decoding approach. As a control analysis that completely rules out any source of overfitting, the authors could test the decoder after shuffling class labels. Following such shuffling, decoding accuracies should drop to chance level for all decoding approaches, including the optimized decoder. This would also provide an estimate of actual chance-level performance (which is informative over and beyond the theoretical chance level). Furthermore, currently, the manuscript does not explain the huge drop in decoding accuracies for the voxel-space decoding (Figure 3B). Finally, the authors' approach to cortical parcellation raises questions regarding the information carried by varying dipole orientations within a parcel (which currently seems to be ignored?) and the implementation of the mean-flipping method (given that there are two dimensions - space and time - what do the authors refer to when they talk about the sign of the "average source", line 477?). 

      The Reviewer recommends that we: 1) conduct an additional control analysis on classifier performance using shuffled class labels, 2) provide a more detailed explanation regarding the drop in decoding accuracies for the voxel-space decoding following LDA dimensionality reduction (see Fig 3B), and 3) provide additional details on how problems related to dipole solution orientations were addressed in the present study.  

      In relation to the first point, we have now implemented a random shuffling approach as a control for the classification analyses. The results of this analysis indicated that the chance level accuracy was 22.12% (± SD 9.1%) for individual keypress decoding (4-class classification), and 18.41% (± SD 7.4%) for individual sequence item decoding (5-class classification), irrespective of the input feature set or the type of decoder used. Thus, the decoding accuracy observed with the final model was substantially higher than these chance levels.  

      Second, please note that the dimensionality of the voxel-space feature set is very high (i.e. – 15684). LDA attempts to map the input features onto a much smaller dimensional space (number of classes-1; e.g. –  3 dimensions, for 4-class keypress decoding). Given the very high dimension of the voxel-space input features in this case, the resulting mapping exhibits reduced accuracy. Despite this general consideration, please refer to Figure 3—figure supplement 3, where we observe improvement in voxel-space decoder performance when utilizing alternative dimensionality reduction techniques.

      The decoders constructed in the present study assess the average spatial patterns across time (as defined by the windowing procedure) in the input feature space.  We now provide additional details in the Methods of the revised manuscript pertaining to the parcellation procedure and how the sign ambiguity problem was addressed in our analysis.

      Weaknesses: 

      A clear weakness of the paper lies in the authors' conclusions regarding "contextualization". Several potential confounds, described below, question the neurobiological implications proposed by the authors and provide a simpler explanation of the results. Furthermore, the paper follows the assumption that short breaks result in offline skill learning, while recent evidence, described below, casts doubt on this assumption. 

      We thank the Reviewer for giving us the opportunity to address these issues in detail (see below).

      The authors interpret the ordinal position information captured by their decoding approach as a reflection of neural coding dedicated to the local context of a movement (Figure 4). One way to dissociate ordinal position information from information about the moving effectors is to train a classifier on one sequence and test the classifier on other sequences that require the same movements, but in different positions50. In the present study, however, participants trained to repeat a single sequence (4-1-3-2-4). As a result, ordinal position information is potentially confounded by the fixed finger transitions around each of the two critical positions (first and fifth press). Across consecutive correct sequences, the first keypress in a given sequence was always preceded by a movement of the index finger (=last movement of the preceding sequence), and followed by a little finger movement. The last keypress, on the other hand, was always preceded by a ring finger movement, and followed by an index finger movement (=first movement of the next sequence). Figure 4 - Supplement 2 shows that finger identity can be decoded with high accuracy (>70%) across a large time window around the time of the key press, up to at least +/-100 ms (and likely beyond, given that decoding accuracy is still high at the boundaries of the window depicted in that figure). This time window approaches the keypress transition times in this study. Given that distinct finger transitions characterized the first and fifth keypress, the classifier could thus rely on persistent (or "lingering") information from the preceding finger movement, and/or "preparatory" information about the subsequent finger movement, in order to dissociate the first and fifth keypress. Currently, the manuscript provides no evidence that the context information captured by the decoding approach is more than a by-product of temporally extended, and therefore overlapping, but independent neural representations of consecutive keypresses that are executed in close temporal proximity - rather than a neural representation dedicated to context. 

      Such temporal overlap of consecutive, independent finger representations may also account for the dynamics of "ordinal coding"/"contextualization", i.e., the increase in 2-class decoding accuracy, across Day 1 (Figure 4C). As learning progresses, both tapping speed and the consistency of keypress transition times increase (Figure 1), i.e., consecutive keypresses are closer in time, and more consistently so. As a result, information related to a given keypress is increasingly overlapping in time with information related to the preceding and subsequent keypresses. The authors seem to argue that their regression analysis in Figure 5 - Figure Supplement 3 speaks against any influence of tapping speed on "ordinal coding" (even though that argument is not made explicitly in the manuscript). However, Figure 5 - Figure Supplement 3 shows inter-individual differences in a between-subject analysis (across trials, as in panel A, or separately for each trial, as in panel B), and, therefore, says little about the within-subject dynamics of "ordinal coding" across the experiment. A regression of trial-by-trial "ordinal coding" on trial-by-trial tapping speed (either within-subject or at a group-level, after averaging across subjects) could address this issue. Given the highly similar dynamics of "ordinal coding" on the one hand (Figure 4C), and tapping speed on the other hand (Figure 1B), I would expect a strong relationship between the two in the suggested within-subject (or group-level) regression. Furthermore, learning should increase the number of (consecutively) correct sequences, and, thus, the consistency of finger transitions. Therefore, the increase in 2-class decoding accuracy may simply reflect an increasing overlap in time of increasingly consistent information from consecutive keypresses, which allows the classifier to dissociate the first and fifth keypress more reliably as learning progresses, simply based on the characteristic finger transitions associated with each. In other words, given that the physical context of a given keypress changes as learning progresses - keypresses move closer together in time and are more consistently correct - it seems problematic to conclude that the mental representation of that context changes. To draw that conclusion, the physical context should remain stable (or any changes to the physical context should be controlled for). 

      The issues raised by Reviewer #3 here are similar to two issues raised by Reviewer #2 above and agree they must both be carefully considered in any evaluation of our findings.

      As both Reviewers pointed out, the classifiers in this study were trained and tested on keypresses performed while practicing a specific sequence (4-1-3-2-4). The study was designed this way as to avoid the impact of interference effects on learning dynamics. The cross-validated performance of classifiers on MEG data collected within the same session was 90.47% overall accuracy (4-class; Figure 3C). We then tested classifier performance on data collected during a separate MEG session conducted approximately 24 hours later (Day 2; see Figure 3—supplement 3). We observed a reduction in overall accuracy rate to 87.11% when tested on MEG data recorded while participants performed the same learned sequence, and 79.44% when they performed several previously unpracticed sequences. This classification performance difference of 7.67% when tested on the Day 2 data could reflect the performance bias of the classifier for the trained sequence, possibly caused by mixed information from temporally close keypresses being incorporated into the feature weights.

      Along these same lines, both Reviewers also raise the possibility that an increase in “ordinal coding/contextualization” with learning could simply reflect an increase in this mixing effect caused by faster typing speeds as opposed to an actual change in the underlying neural representation. The basic idea is that as correct sequences are generated at higher and higher speeds over training, MEG activity patterns related to the planning, execution, evaluation and memory of individual keypresses overlap more in time. Thus, increased overlap between the “4” and “1” keypresses (at the start of the sequence) and “2” and “4” keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged (assuming this mixing of representations is used by the classifier to differentially tag each index finger press). If this were the case, it follows that such mixing effects reflecting the ordinal sequence structure would also be observable in the distribution of decoder misclassifications. For example, “4” keypresses would be more likely to be misclassified as “1” or “2” keypresses (or vice versa) than as “3” keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3—figure supplement 3A in the previously submitted manuscript do not show this trend in the distribution of misclassifications across the four fingers.

      Following this logic, it’s also possible that if the ordinal coding is largely driven by this mixing effect, the increased overlap between consecutive index finger keypresses during the 4-4 transition marking the end of one sequence and the beginning of the next one could actually mask contextualization-related changes to the underlying neural representations and make them harder to detect. In this case, a decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position might show decreased performance with learning as adjacent keypresses overlapped in time with each other to an increasing extent. However, Figure 4C in our previously submitted manuscript does not support this possibility, as the 2-class hybrid classifier displays improved classification performance over early practice trials despite greater temporal overlap.

      As noted in the above replay to Reviewer #2, we also conducted a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis affirmed that the possible alternative explanation put forward by the Reviewer is not supported by our data (Adjusted R2 = 0.00431; F = 5.62). We now include this new negative control analysis result in the revised manuscript.

      Finally, the Reviewer hints that one way to address this issue would be to compare MEG responses before and after learning for sequences typed at a fixed speed. However, given that the speed-accuracy trade-off should improve with learning, a comparison between unlearned and learned skill states would dictate that the skill be evaluated at a very low fixed speed. Essentially, such a design presents the problem that the post-training test is evaluating the representation in the unlearned behavioral state that is not representative of the acquired skill. Thus, this approach would not address our experimental question: “do neural representations of the same action performed at different locations within a skill sequence contextually differentiate or remain stable as learning evolves”.

      A similar difference in physical context may explain why neural representation distances ("differentiation") differ between rest and practice (Figure 5). The authors define "offline differentiation" by comparing the hybrid space features of the last index finger movement of a trial (ordinal position 5) and the first index finger movement of the next trial (ordinal position 1). However, the latter is not only the first movement in the sequence but also the very first movement in that trial (at least in trials that started with a correct sequence), i.e., not preceded by any recent movement. In contrast, the last index finger of the last correct sequence in the preceding trial includes the characteristic finger transition from the fourth to the fifth movement. Thus, there is more overlapping information arising from the consistent, neighbouring keypresses for the last index finger movement, compared to the first index finger movement of the next trial. A strong difference (larger neural representation distance) between these two movements is, therefore, not surprising, given the task design, and this difference is also expected to increase with learning, given the increase in tapping speed, and the consequent stronger overlap in representations for consecutive keypresses. Furthermore, initiating a new sequence involves pre-planning, while ongoing practice relies on online planning (Ariani et al., eNeuro 2021), i.e., two mental operations that are dissociable at the level of neural representation (Ariani et al., bioRxiv 2023). 

      The Reviewer argues that the comparison of last finger movement of a trial and the first in the next trial are performed in different circumstances and contexts. This is an important point and one we tend to agree with. For this task, the first sequence in a practice trial (which is pre-planned offline) is performed in a somewhat different context from the sequence iterations that follow, which involve temporally overlapping planning, execution and evaluation processes.  The Reviewer is particularly concerned about a difference in the temporal mixing effect issue raised above between the first and last keypresses performed in a trial. However, in contrast to the Reviewers stated argument above, findings from Korneysheva et. al (2019) showed that neural representations of individual actions are competitively queued during the pre-planning period in a manner that reflects the ordinal structure of the learned sequence.  Thus, mixing effects are likely still present for the first keypress in a trial. Also note that we now present new control analyses in multiple responses above confirming that hypothetical mixing effects between adjacent keypresses do not explain our reported contextualization finding. A statement addressing these possibilities raised by the Reviewer has been added to the Discussion in the revised manuscript.

      In relation to pre-planning, ongoing MEG work in our lab is investigating contextualization within different time windows tailored specifically for assessing how sequence skill action planning evolves with learning.

      Given these differences in the physical context and associated mental processes, it is not surprising that "offline differentiation", as defined here, is more pronounced than "online differentiation". For the latter, the authors compared movements that were better matched regarding the presence of consistent preceding and subsequent keypresses (online differentiation was defined as the mean difference between all first vs. last index finger movements during practice).  It is unclear why the authors did not follow a similar definition for "online differentiation" as for "micro-online gains" (and, indeed, a definition that is more consistent with their definition of "offline differentiation"), i.e., the difference between the first index finger movement of the first correct sequence during practice, and the last index finger of the last correct sequence. While these two movements are, again, not matched for the presence of neighbouring keypresses (see the argument above), this mismatch would at least be the same across "offline differentiation" and "online differentiation", so they would be more comparable. 

      This is the same point made earlier by Reviewer #2, and we agree with this assessment. As stated in the response to Reviewer #2 above, we have now carried out quantification of online contextualization using this approach and included it in the revised manuscript. We thank the Reviewer for this suggestion.

      A further complication in interpreting the results regarding "contextualization" stems from the visual feedback that participants received during the task. Each keypress generated an asterisk shown above the string on the screen, irrespective of whether the keypress was correct or incorrect. As a result, incorrect (e.g., additional, or missing) keypresses could shift the phase of the visual feedback string (of asterisks) relative to the ordinal position of the current movement in the sequence (e.g., the fifth movement in the sequence could coincide with the presentation of any asterisk in the string, from the first to the fifth). Given that more incorrect keypresses are expected at the start of the experiment, compared to later stages, the consistency in visual feedback position, relative to the ordinal position of the movement in the sequence, increased across the experiment. A better differentiation between the first and the fifth movement with learning could, therefore, simply reflect better decoding of the more consistent visual feedback, based either on the feedback-induced brain response, or feedback-induced eye movements (the study did not include eye tracking). It is not clear why the authors introduced this complicated visual feedback in their task, besides consistency with their previous studies.

      We strongly agree with the Reviewer that eye movements related to task engagement are important to rule out as a potential driver of the decoding accuracy or contextualization effect. We address this issue above in response to a question raised by Reviewer #1 about the impact of movement related artefacts in general on our findings.

      First, the assumption the Reviewer makes here about the distribution of errors in this task is incorrect. On average across subjects, 2.32% ± 1.48% (mean ± SD) of all keypresses performed were errors, which were evenly distributed across the four possible keypress responses. While errors increased progressively over practice trials, they did so in proportion to the increase in correct keypresses, so that the overall ratio of correct-to-incorrect keypresses remained stable over the training session. Thus, the Reviewer’s assumptions that there is a higher relative frequency of errors in early trials, and a resulting systematic trend phase shift differences between the visual display updates (i.e. – a change in asterisk position above the displayed sequence) and the keypress performed is not substantiated by the data. To the contrary, the asterisk position on the display and the keypress being executed remained highly correlated over the entire training session. We now include a statement about the frequency and distribution of errors in the revised manuscript.

      Given this high correlation, we firmly agree with the Reviewer that the issue of eye movement-related artefacts is still an important one to address. Fortunately, we did collect eye movement data during the MEG recordings so were able to investigate this. As detailed in the response to Reviewer #1 above, we found that gaze positions and eye-movement velocity time-locked to visual display updates (i.e. – a change in asterisk position above the displayed sequence) did not reflect the asterisk location above chance levels (Overall cross-validated accuracy = 0.21817; see Author response image 1). Furthermore, an inspection of the eye position data revealed that a majority of participants on most trials displayed random walk gaze patterns around a center fixation point, indicating that participants did not attend to the asterisk position on the display. This is consistent with intrinsic generation of the action sequence, and congruent with the fact that the display does not provide explicit feedback related to performance. As pointed out above, a similar real-world example would be manually inputting a long password into a secure online application. In this case, one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks), which is typically ignored by the user. Notably, the minimal participant engagement with the visual task display observed in this study highlights an important difference between behavior observed during explicit sequence learning motor tasks (which is highly generative in nature) with reactive responses to stimulus cues in a serial reaction time task (SRTT).  This is a crucial difference that must be carefully considered when comparing findings across studies. All elements pertaining to this new control analysis are now included in the revised manuscript.

      The authors report a significant correlation between "offline differentiation" and cumulative micro-offline gains. However, it would be more informative to correlate trial-by-trial changes in each of the two variables. This would address the question of whether there is a trial-by-trial relation between the degree of "contextualization" and the amount of micro-offline gains - are performance changes (micro-offline gains) less pronounced across rest periods for which the change in "contextualization" is relatively low? Furthermore, is the relationship between micro-offline gains and "offline differentiation" significantly stronger than the relationship between micro-offline gains and "online differentiation"? 

      In response to a similar issue raised above by Reviewer #2, we now include new analyses comparing correlation magnitudes between (1) “online differention” vs micro-online gains, (2) “online differention” vs micro-offline gains and (3) “offline differentiation” and micro-offline gains (see Author response images 4, 5 and 6 above). These new analyses and results have been added to the revised manuscript. Once again, we thank both Reviewers for this suggestion.

      The authors follow the assumption that micro-offline gains reflect offline learning.

      This statement is incorrect. The original Bonstrup et al (2019) 49 paper clearly states that micro-offline gains must be carefully interpreted based upon the behavioral context within which they are observed, and lays out the conditions under which one can have confidence that micro-offline gains reflect offline learning.  In fact, the excellent meta-analysis of Pan & Rickard (2015) 51, which re-interprets the benefits of sleep in overnight skill consolidation from a “reactive inhibition” perspective, was a crucial resource in the experimental design of our initial study49, as well as in all our subsequent work. Pan & Rickard stated:

      “Empirically, reactive inhibition refers to performance worsening that can accumulate during a period of continuous training (Hull, 1943). It tends to dissipate, at least in part, when brief breaks are inserted between blocks of training. If there are multiple performance-break cycles over a training session, as in the motor sequence literature, performance can exhibit a scalloped effect, worsening during each uninterrupted performance block but improving across blocks52,53. Rickard, Cai, Rieth, Jones, and Ard (2008) and Brawn, Fenn, Nusbaum, and Margoliash (2010) 52,53 demonstrated highly robust scalloped reactive inhibition effects using the commonly employed 30 s–30 s performance break cycle, as shown for Rickard et al.’s (2008) massed practice sleep group in Figure 2. The scalloped effect is evident for that group after the first few 30 s blocks of each session. The absence of the scalloped effect during the first few blocks of training in the massed group suggests that rapid learning during that period masks any reactive inhibition effect.”

      Crucially, Pan & Rickard51 made several concrete recommendations for reducing the impact of the reactive inhibition confound on offline learning studies. One of these recommendations was to reduce practice times to 10s (most prior sequence learning studies up until that point had employed 30s long practice trials). They stated:

      “The traditional design involving 30 s-30 s performance break cycles should be abandoned given the evidence that it results in a reactive inhibition confound, and alternative designs with reduced performance duration per block used instead 51. One promising possibility is to switch to 10 s performance durations for each performance-break cycle Instead 51. That design appears sufficient to eliminate at least the majority of the reactive inhibition effect 52,53.”

      We mindfully incorporated recommendations from Pan and Rickard51  into our own study designs including 1) utilizing 10s practice trials and 2) constraining our analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur), which are prior to the emergence of the “scalloped” performance dynamics that are strongly linked to reactive inhibition effects. 

      However, there is no direct evidence in the literature that micro-offline gains really result from offline learning, i.e., an improvement in skill level.

      We strongly disagree with the Reviewer’s assertion that “there is no direct evidence in the literature that micro-offline gains really result from offline learning, i.e., an improvement in skill level.”  The initial Bönstrup et al. (2019) 49 report was followed up by a large online crowd-sourcing study (Bönstrup et al., 2020) 54. This second (and much larger) study provided several additional important findings supporting our interpretation of micro-offline gains in cases where the important behavioral conditions clarified above were met (see Author response image 7 below for further details on these conditions).

      Author response image 7.

      Micro-offline gains observed in learning and non-learning contexts are attributed to different underlying causes. (A) Micro-offline and online changes relative to overall trial-by-trial learning. This figure is based on data from Bönstrup et al. (2019) 49. During early learning, micro-offline gains (red bars) closely track trial-by-trial performance gains (green line with open circle markers), with minimal contribution from micro-online gains (blue bars). The stated conclusion in Bönstrup et al. (2019) is that micro-offline gains only during this Early Learning stage reflect rapid memory consolidation (see also 54). After early learning, about practice trial 11, skill plateaus. This plateau skill period is characterized by a striking emergence of coupled (and relatively stable) micro-online drops and micro-offline increases. Bönstrup et al. (2019) as well as others in the literature 55-57, argue that micro-offline gains during the plateau period likely reflect recovery from inhibitory performance factors such as reactive inhibition or fatigue, and thus must be excluded from analyses relating micro-offline gains to skill learning.  The Non-repeating groups in Experiments 3 and 4 from Das et al. (2024) suffer from a lack of consideration of these known confounds.

      Evidence documented in that paper54 showed that micro-offline gains during early skill learning were: 1) replicable and generalized to subjects learning the task in their daily living environment (n=389); 2) equivalent when significantly shortening practice period duration, thus confirming that they are not a result of recovery from performance fatigue (n=118);  3) reduced (along with learning rates) by retroactive interference applied immediately after each practice period relative to interference applied after passage of time (n=373), indicating stabilization of the motor memory at a microscale of several seconds consistent with rapid consolidation; and 4) not modified by random termination of the practice periods, ruling out a contribution of predictive motor slowing (N = 71) 54.  Altogether, our findings were strongly consistent with the interpretation that micro-offline gains reflect memory consolidation supporting early skill learning. This is precisely the portion of the learning curve Pan and Rickard51 refer to when they state “…rapid learning during that period masks any reactive inhibition effect”.

      This interpretation is further supported by brain imaging evidence linking known memory-related networks and consolidation mechanisms to micro-offline gains. First, we reported that the density of fast hippocampo-neocortical skill memory replay events increases approximately three-fold during early learning inter-practice rest periods with the density explaining differences in the magnitude of micro-offline gains across subjects1. Second, Jacobacci et al. (2020) independently reproduced our original behavioral findings and reported BOLD fMRI changes in the hippocampus and precuneus (regions also identified in our MEG study1) linked to micro-offline gains during early skill learning. 33 These functional changes were coupled with rapid alterations in brain microstructure in the order of minutes, suggesting that the same network that operates during rest periods of early learning undergoes structural plasticity over several minutes following practice58. Third, even more recently, Chen et al. (2024) provided direct evidence from intracranial EEG in humans linking sharp-wave ripple events (which are known markers for neural replay59) in the hippocampus (80-120 Hz in humans) with micro-offline gains during early skill learning. The authors report that the strong increase in ripple rates tracked learning behavior, both across blocks and across participants. The authors conclude that hippocampal ripples during resting offline periods contribute to motor sequence learning. 2

      Thus, there is actually now substantial evidence in the literature directly supporting the assertion “that micro-offline gains really result from offline learning”.  On the contrary, according to Gupta & Rickard (2024) “…the mechanism underlying RI [reactive inhibition] is not well established” after over 80 years of investigation60, possibly due to the fact that “reactive inhibition” is a categorical description of behavioral effects that likely result from several heterogenous processes with very different underlying mechanisms.

      On the contrary, recent evidence questions this interpretation (Gupta & Rickard, npj Sci Learn 2022; Gupta & Rickard, Sci Rep 2024; Das et al., bioRxiv 2024). Instead, there is evidence that micro-offline gains are transient performance benefits that emerge when participants train with breaks, compared to participants who train without breaks, however, these benefits vanish within seconds after training if both groups of participants perform under comparable conditions (Das et al., bioRxiv 2024). 

      It is important to point out that the recent work of Gupta & Rickard (2022,2024) 55 does not present any data that directly opposes our finding that early skill learning49 is expressed as micro-offline gains during rest breaks. These studies are essentially an extension of the Rickard et al (2008) paper that employed a massed (30s practice followed by 30s breaks) vs spaced (10s practice followed by 10s breaks) to assess if recovery from reactive inhibition effects could account for performance gains measured after several minutes or hours. Gupta & Rickard (2022) added two additional groups (30s practice/10s break and 10s practice/10s break as used in the work from our group). The primary aim of the study was to assess whether it was more likely that changes in performance when retested 5 minutes after skill training (consisting of 12 practice trials for the massed groups and 36 practice trials for the spaced groups) had ended reflected memory consolidation effects or recovery from reactive inhibition effects. The Gupta & Rickard (2024) follow-up paper employed a similar design with the primary difference being that participants performed a fixed number of sequences on each trial as opposed to trials lasting a fixed duration. This was done to facilitate the fitting of a quantitative statistical model to the data.  To reiterate, neither study included any analysis of micro-online or micro-offline gains and did not include any comparison focused on skill gains during early learning. Instead, Gupta & Rickard (2022), reported evidence for reactive inhibition effects for all groups over much longer training periods. Again, we reported the same finding for trials following the early learning period in our original Bönstrup et al. (2019) paper49 (Author response image 7). Also, please note that we reported in this paper that cumulative micro-offline gains over early learning did not correlate with overnight offline consolidation measured 24 hours later49 (see the Results section and further elaboration in the Discussion). Thus, while the composition of our data is supportive of a short-term memory consolidation process operating over several seconds during early learning, it likely differs from those involved over longer training times and offline periods, as assessed by Gupta & Rickard (2022).

      In the recent preprint from Das et al (2024) 61,  the authors make the strong claim that “micro-offline gains during early learning do not reflect offline learning” which is not supported by their own data.   The authors hypothesize that if “micro-offline gains represent offline learning, participants should reach higher skill levels when training with breaks, compared to training without breaks”.  The study utilizes a spaced vs. massed practice group between-subjects design inspired by the reactive inhibition work from Rickard and others to test this hypothesis. Crucially, the design incorporates only a small fraction of the training used in other investigations to evaluate early skill learning1,33,49,54,57,58,62.  A direct comparison between the practice schedule designs for the spaced and massed groups in Das et al., and the training schedule all participants experienced in the original Bönstrup et al. (2019) paper highlights this issue as well as several others (Author response image 8):

      Author response image 8.

      (A) Comparison of Das et al. Spaced & Massed group training session designs, and the training session design from the original Bönstrup et al. (2019) 49 paper. Similar to the approach taken by Das et al., all practice is visualized as 10-second practice trials with a variable number (either 0, 1 or 30) of 10-second-long inter-practice rest intervals to allow for direct comparisons between designs. The two key takeaways from this comparison are that (1) the intervention differences (i.e. – practice schedules) between the Massed and Spaced groups from the Das et al. report are extremely small (less than 12% of the overall session schedule) and (2) the overall amount of practice is much less than compared to the design from the original Bönstrup report 49  (which has been utilized in several subsequent studies). (B) Group-level learning curve data from Bönstrup et al. (2019) 49 is used to estimate the performance range accounted for by the equivalent periods covering Test 1, Training 1 and Test 2 from Das et al (2024). Note that the intervention in the Das et al. study is limited to a period covering less than 50% of the overall learning range.

      First, participants in the original Bönstrup et al. study 49 experienced 157.14% more practice time and 46.97% less inter-practice rest time than the Spaced group in the Das et al. study (Author response image 8).  Thus, the overall amount of practice and rest differ substantially between studies, with much more limited training occurring for participants in Das et al.  

      Second, and perhaps most importantly, the actual intervention (i.e. – the difference in practice schedule between the Spaced and Massed groups) employed by Das et al. covers a very small fraction of the overall training session. Identical practice schedule segments for both the Spaced & Massed groups are indicated by the red shaded area in Author response image 8. Please note that these identical segments cover 94.84% of the Massed group training schedule and 88.01% of the Spaced group training schedule (since it has 60 seconds of additional rest). This means that the actual interventions cover less than 5% (for Massed) and 12% (for Spaced) of the total training session, which minimizes any chance of observing a difference between groups.

      Also note that the very beginning of the practice schedule (during which Figure R9 shows substantial learning is known to occur) is labeled in the Das et al. study as Test 1.  Test 1 encompasses the first 20 seconds of practice (alternatively viewed as the first two 10-second-long practice trials with no inter-practice rest). This is immediately followed by the Training 1 intervention, which is composed of only three 10-second-long practice trials (with 10-second inter-practice rest for the Spaced group and no inter-practice rest for the Massed group). Author response image 8 also shows that since there is no inter-practice rest after the third Training practice trial for the Spaced group, this third trial (for both Training 1 and 2) is actually a part of an identical practice schedule segment shared by both groups (Massed and Spaced), reducing the magnitude of the intervention even further.

      Moreover, we know from the original Bönstrup et al. (2019) paper49 that 46.57% of all overall group-level performance gains occurred between trials 2 and 5 for that study. Thus, Das et al. are limiting their designed intervention to a period covering less than half of the early learning range discussed in the literature, which again, minimizes any chance of observing an effect.

      This issue is amplified even further at Training 2 since skill learning prior to the long 5-minute break is retained, further constraining the performance range over these three trials. A related issue pertains to the trials labeled as Test 1 (trials 1-2) and Test 2 (trials 6-7) by Das et al. Again, we know from the original Bönstrup et al. paper 49 that 18.06% and 14.43% (32.49% total) of all overall group-level performance gains occurred during trials corresponding to Das et al Test 1 and Test 2, respectively. In other words, Das et al averaged skill performance over 20 seconds of practice at two time-points where dramatic skill improvements occur. Pan & Rickard (1995) previously showed that such averaging is known to inject artefacts into analyses of performance gains.

      Furthermore, the structure of the Test in Das et. al study appears to have an interference effect on the Spaced group performance after the training intervention.  This makes sense if you consider that the Spaced group is required to now perform the task in a Massed practice environment (i.e., two 10-second-long practice trials merged into one long trial), further blurring the true intervention effects. This effect is observable in Figure 1C,E of their pre-print. Specifically, while the Massed group continues to show an increase in performance during test relative to the last 10 seconds of practice during training, the Spaced group displays a marked decrease. This decrease is in stark contrast to the monotonic increases observed for both groups at all other time-points.

      Interestingly, when statistical comparisons between the groups are made at the time-points when the intervention is present (as opposed to after it has been removed) then the stated hypothesis, “If micro-offline gains represent offline learning, participants should reach higher skill levels when training with breaks, compared to training without breaks”, is confirmed.

      The data presented by Gupta and Rickard (2022, 2024) and Das et al. (2024) is in many ways more confirmatory of the constraints employed by our group and others with respect to experimental design, analysis and interpretation of study findings, rather than contradictory. Still, it does highlight a limitation of the current micro-online/offline framework, which was originally only intended to be applied to early skill learning over spaced practice schedules when reactive inhibition effects are minimized49. Extrapolation of this current framework to post-plateau performance periods, longer timespans, or non-learning situations (e.g. – the Non-repeating groups from Experiments 3 & 4 in Das et al. (2024)), when reactive inhibition plays a more substantive role, is not warranted. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. - memory consolidation, planning, working memory and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.

      References

      (1) Buch, E. R., Claudino, L., Quentin, R., Bonstrup, M. & Cohen, L. G. Consolidation of human skill linked to waking hippocampo-neocortical replay. Cell Rep 35, 109193 (2021). https://doi.org:10.1016/j.celrep.2021.109193

      (2) Chen, P.-C., Stritzelberger, J., Walther, K., Hamer, H. & Staresina, B. P. Hippocampal ripples during offline periods predict human motor sequence learning. bioRxiv, 2024.2010.2006.614680 (2024). https://doi.org:10.1101/2024.10.06.614680

      (3) Classen, J., Liepert, J., Wise, S. P., Hallett, M. & Cohen, L. G. Rapid plasticity of human cortical movement representation induced by practice. J Neurophysiol 79, 1117-1123 (1998).

      (4) Karni, A. et al. Functional MRI evidence for adult motor cortex plasticity during motor skill learning. Nature 377, 155-158 (1995). https://doi.org:10.1038/377155a0

      (5) Kleim, J. A., Barbay, S. & Nudo, R. J. Functional reorganization of the rat motor cortex following motor skill learning. J Neurophysiol 80, 3321-3325 (1998).

      (6) Shadmehr, R. & Holcomb, H. H. Neural correlates of motor memory consolidation. Science 277, 821-824 (1997).

      (7) Doyon, J. et al. Experience-dependent changes in cerebellar contributions to motor sequence learning. Proc Natl Acad Sci U S A 99, 1017-1022 (2002).

      (8) Toni, I., Ramnani, N., Josephs, O., Ashburner, J. & Passingham, R. E. Learning arbitrary visuomotor associations: temporal dynamic of brain activity. Neuroimage 14, 1048-1057 (2001).

      (9) Grafton, S. T. et al. Functional anatomy of human procedural learning determined with regional cerebral blood flow and PET. J Neurosci 12, 2542-2548 (1992).

      (10) Kennerley, S. W., Sakai, K. & Rushworth, M. F. Organization of action sequences and the role of the pre-SMA. J Neurophysiol 91, 978-993 (2004). https://doi.org:10.1152/jn.00651.2003 00651.2003 [pii]

      (11) Hardwick, R. M., Rottschy, C., Miall, R. C. & Eickhoff, S. B. A quantitative meta-analysis and review of motor learning in the human brain. Neuroimage 67, 283-297 (2013). https://doi.org:10.1016/j.neuroimage.2012.11.020

      (12) Sawamura, D. et al. Acquisition of chopstick-operation skills with the non-dominant hand and concomitant changes in brain activity. Sci Rep 9, 20397 (2019). https://doi.org:10.1038/s41598-019-56956-0

      (13) Lee, S. H., Jin, S. H. & An, J. The difference in cortical activation pattern for complex motor skills: A functional near- infrared spectroscopy study. Sci Rep 9, 14066 (2019). https://doi.org:10.1038/s41598-019-50644-9

      (14) Battaglia-Mayer, A. & Caminiti, R. Corticocortical Systems Underlying High-Order Motor Control. J Neurosci 39, 4404-4421 (2019). https://doi.org:10.1523/JNEUROSCI.2094-18.2019

      (15) Toni, I., Thoenissen, D. & Zilles, K. Movement preparation and motor intention. Neuroimage 14, S110-117 (2001). https://doi.org:10.1006/nimg.2001.0841

      (16) Wolpert, D. M., Goodbody, S. J. & Husain, M. Maintaining internal representations: the role of the human superior parietal lobe. Nat Neurosci 1, 529-533 (1998). https://doi.org:10.1038/2245

      (17) Andersen, R. A. & Buneo, C. A. Intentional maps in posterior parietal cortex. Annu Rev Neurosci 25, 189-220 (2002). https://doi.org:10.1146/annurev.neuro.25.112701.142922 112701.142922 [pii]

      (18) Buneo, C. A. & Andersen, R. A. The posterior parietal cortex: sensorimotor interface for the planning and online control of visually guided movements. Neuropsychologia 44, 2594-2606 (2006). https://doi.org:S0028-3932(05)00333-7 [pii] 10.1016/j.neuropsychologia.2005.10.011

      (19) Grover, S., Wen, W., Viswanathan, V., Gill, C. T. & Reinhart, R. M. G. Long-lasting, dissociable improvements in working memory and long-term memory in older adults with repetitive neuromodulation. Nat Neurosci 25, 1237-1246 (2022). https://doi.org:10.1038/s41593-022-01132-3

      (20) Colclough, G. L. et al. How reliable are MEG resting-state connectivity metrics? Neuroimage 138, 284-293 (2016). https://doi.org:10.1016/j.neuroimage.2016.05.070

      (21) Colclough, G. L., Brookes, M. J., Smith, S. M. & Woolrich, M. W. A symmetric multivariate leakage correction for MEG connectomes. NeuroImage 117, 439-448 (2015). https://doi.org:10.1016/j.neuroimage.2015.03.071

      (22) Mollazadeh, M. et al. Spatiotemporal variation of multiple neurophysiological signals in the primary motor cortex during dexterous reach-to-grasp movements. J Neurosci 31, 15531-15543 (2011). https://doi.org:10.1523/JNEUROSCI.2999-11.2011

      (23) Bansal, A. K., Vargas-Irwin, C. E., Truccolo, W. & Donoghue, J. P. Relationships among low-frequency local field potentials, spiking activity, and three-dimensional reach and grasp kinematics in primary motor and ventral premotor cortices. J Neurophysiol 105, 1603-1619 (2011). https://doi.org:10.1152/jn.00532.2010

      (24) Flint, R. D., Ethier, C., Oby, E. R., Miller, L. E. & Slutzky, M. W. Local field potentials allow accurate decoding of muscle activity. J Neurophysiol 108, 18-24 (2012). https://doi.org:10.1152/jn.00832.2011

      (25) Churchland, M. M. et al. Neural population dynamics during reaching. Nature 487, 51-56 (2012). https://doi.org:10.1038/nature11129

      (26) Bassett, D. S. et al. Dynamic reconfiguration of human brain networks during learning. Proc Natl Acad Sci U S A 108, 7641-7646 (2011). https://doi.org:10.1073/pnas.1018985108

      (27) Albouy, G., King, B. R., Maquet, P. & Doyon, J. Hippocampus and striatum: dynamics and interaction during acquisition and sleep-related motor sequence memory consolidation. Hippocampus 23, 985-1004 (2013). https://doi.org:10.1002/hipo.22183

      (28) Albouy, G. et al. Neural correlates of performance variability during motor sequence acquisition. Neuroimage 60, 324-331 (2012). https://doi.org:10.1016/j.neuroimage.2011.12.049

      (29) Qin, Y. L., McNaughton, B. L., Skaggs, W. E. & Barnes, C. A. Memory reprocessing in corticocortical and hippocampocortical neuronal ensembles. Philos Trans R Soc Lond B Biol Sci 352, 1525-1533 (1997). https://doi.org:10.1098/rstb.1997.0139

      (30) Euston, D. R., Tatsuno, M. & McNaughton, B. L. Fast-forward playback of recent memory sequences in prefrontal cortex during sleep. Science 318, 1147-1150 (2007). https://doi.org:10.1126/science.1148979

      (31) Molle, M. & Born, J. Hippocampus whispering in deep sleep to prefrontal cortex--for good memories? Neuron 61, 496-498 (2009). https://doi.org:S0896-6273(09)00122-6 [pii] 10.1016/j.neuron.2009.02.002

      (32) Frankland, P. W. & Bontempi, B. The organization of recent and remote memories. Nat Rev Neurosci 6, 119-130 (2005). https://doi.org:10.1038/nrn1607

      (33) Jacobacci, F. et al. Rapid hippocampal plasticity supports motor sequence learning. Proc Natl Acad Sci U S A 117, 23898-23903 (2020). https://doi.org:10.1073/pnas.2009576117

      (34) Albouy, G. et al. Maintaining vs. enhancing motor sequence memories: respective roles of striatal and hippocampal systems. Neuroimage 108, 423-434 (2015). https://doi.org:10.1016/j.neuroimage.2014.12.049

      (35) Gais, S. et al. Sleep transforms the cerebral trace of declarative memories. Proc Natl Acad Sci U S A 104, 18778-18783 (2007). https://doi.org:0705454104 [pii] 10.1073/pnas.0705454104

      (36) Sterpenich, V. et al. Sleep promotes the neural reorganization of remote emotional memory. J Neurosci 29, 5143-5152 (2009). https://doi.org:10.1523/JNEUROSCI.0561-09.2009

      (37) Euston, D. R., Gruber, A. J. & McNaughton, B. L. The role of medial prefrontal cortex in memory and decision making. Neuron 76, 1057-1070 (2012). https://doi.org:10.1016/j.neuron.2012.12.002

      (38) van Kesteren, M. T., Fernandez, G., Norris, D. G. & Hermans, E. J. Persistent schema-dependent hippocampal-neocortical connectivity during memory encoding and postencoding rest in humans. Proc Natl Acad Sci U S A 107, 7550-7555 (2010). https://doi.org:10.1073/pnas.0914892107

      (39) van Kesteren, M. T., Ruiter, D. J., Fernandez, G. & Henson, R. N. How schema and novelty augment memory formation. Trends Neurosci 35, 211-219 (2012). https://doi.org:10.1016/j.tins.2012.02.001

      (40) Wagner, A. D. et al. Building memories: remembering and forgetting of verbal experiences as predicted by brain activity. Science (New York, N.Y.) 281, 1188-1191 (1998).

      (41) Ashe, J., Lungu, O. V., Basford, A. T. & Lu, X. Cortical control of motor sequences. Curr Opin Neurobiol 16, 213-221 (2006).

      (42) Hikosaka, O., Nakamura, K., Sakai, K. & Nakahara, H. Central mechanisms of motor skill learning. Curr Opin Neurobiol 12, 217-222 (2002).

      (43) Penhune, V. B. & Steele, C. J. Parallel contributions of cerebellar, striatal and M1 mechanisms to motor sequence learning. Behav. Brain Res. 226, 579-591 (2012). https://doi.org:10.1016/j.bbr.2011.09.044

      (44) Doyon, J. et al. Contributions of the basal ganglia and functionally related brain structures to motor learning. Behavioural brain research 199, 61-75 (2009). https://doi.org:10.1016/j.bbr.2008.11.012

      (45) Schendan, H. E., Searl, M. M., Melrose, R. J. & Stern, C. E. An FMRI study of the role of the medial temporal lobe in implicit and explicit sequence learning. Neuron 37, 1013-1025 (2003). https://doi.org:10.1016/s0896-6273(03)00123-5

      (46) Morris, R. G. M. Elements of a neurobiological theory of hippocampal function: the role of synaptic plasticity, synaptic tagging and schemas. The European journal of neuroscience 23, 2829-2846 (2006). https://doi.org:10.1111/j.1460-9568.2006.04888.x

      (47) Tse, D. et al. Schemas and memory consolidation. Science 316, 76-82 (2007). https://doi.org:10.1126/science.1135935

      (48) Berlot, E., Popp, N. J. & Diedrichsen, J. A critical re-evaluation of fMRI signatures of motor sequence learning. Elife 9 (2020). https://doi.org:10.7554/eLife.55241

      (49) Bonstrup, M. et al. A Rapid Form of Offline Consolidation in Skill Learning. Curr Biol 29, 1346-1351 e1344 (2019). https://doi.org:10.1016/j.cub.2019.02.049

      (50) Kornysheva, K. et al. Neural Competitive Queuing of Ordinal Structure Underlies Skilled Sequential Action. Neuron 101, 1166-1180 e1163 (2019). https://doi.org:10.1016/j.neuron.2019.01.018

      (51) Pan, S. C. & Rickard, T. C. Sleep and motor learning: Is there room for consolidation? Psychol Bull 141, 812-834 (2015). https://doi.org:10.1037/bul0000009

      (52) Rickard, T. C., Cai, D. J., Rieth, C. A., Jones, J. & Ard, M. C. Sleep does not enhance motor sequence learning. J Exp Psychol Learn Mem Cogn 34, 834-842 (2008). https://doi.org:10.1037/0278-7393.34.4.834

      53) Brawn, T. P., Fenn, K. M., Nusbaum, H. C. & Margoliash, D. Consolidating the effects of waking and sleep on motor-sequence learning. J Neurosci 30, 13977-13982 (2010). https://doi.org:10.1523/JNEUROSCI.3295-10.2010

      (54) Bonstrup, M., Iturrate, I., Hebart, M. N., Censor, N. & Cohen, L. G. Mechanisms of offline motor learning at a microscale of seconds in large-scale crowdsourced data. NPJ Sci Learn 5, 7 (2020). https://doi.org:10.1038/s41539-020-0066-9

      (55) Gupta, M. W. & Rickard, T. C. Dissipation of reactive inhibition is sufficient to explain post-rest improvements in motor sequence learning. NPJ Sci Learn 7, 25 (2022). https://doi.org:10.1038/s41539-022-00140-z

      (56) Jacobacci, F. et al. Rapid hippocampal plasticity supports motor sequence learning. Proceedings of the National Academy of Sciences 117, 23898-23903 (2020).

      (57) Brooks, E., Wallis, S., Hendrikse, J. & Coxon, J. Micro-consolidation occurs when learning an implicit motor sequence, but is not influenced by HIIT exercise. NPJ Sci Learn 9, 23 (2024). https://doi.org:10.1038/s41539-024-00238-6

      (58) Deleglise, A. et al. Human motor sequence learning drives transient changes in network topology and hippocampal connectivity early during memory consolidation. Cereb Cortex 33, 6120-6131 (2023). https://doi.org:10.1093/cercor/bhac489

      (59) Buzsaki, G. Hippocampal sharp wave-ripple: A cognitive biomarker for episodic memory and planning. Hippocampus 25, 1073-1188 (2015). https://doi.org:10.1002/hipo.22488

      (60) Gupta, M. W. & Rickard, T. C. Comparison of online, offline, and hybrid hypotheses of motor sequence learning using a quantitative model that incorporate reactive inhibition. Sci Rep 14, 4661 (2024). https://doi.org:10.1038/s41598-024-52726-9

      (61) Das, A., Karagiorgis, A., Diedrichsen, J., Stenner, M.-P. & Azanon, E. “Micro-offline gains” convey no benefit for motor skill learning. bioRxiv, 2024.2007.2011.602795 (2024). https://doi.org:10.1101/2024.07.11.602795

      (62) Mylonas, D. et al. Maintenance of Procedural Motor Memory across Brief Rest Periods Requires the Hippocampus. J Neurosci 44 (2024). https://doi.org:10.1523/JNEUROSCI.1839-23.2024

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      By examining the prevalence of interactions with ancient amino acids of coenzymes in ancient versus recent folds, the authors noticed an increased interaction propensity for ancient interactions. They infer from this that coenzymes might have played an important role in prebiotic proteins.

      Strengths:

      (1) The analysis, which is very straightforward, is technically correct. However, the conclusions might not be as strong as presented.

      (2) This paper presents an excellent summary of contemporary thought on what might have constituted prebiotic proteins and their properties.

      (3) The paper is clearly written.

      We are grateful for the kind comments of the reviewer on our manuscript. However, we would like to clarify a possible misunderstanding in the summary of our study. Specifically, analysis of "ancient versus recent folds" was not really reported in our results. Our analysis concerned "coenzyme age" rather than the "protein folds age" and was focused mainly on interaction with early vs. late amino acids in protein sequence. While structural propensities of the coenzyme binding sites were also analyzed, no distinction on the level of ancient vs. recent folds was assumed and this was only commented on in the discussion, based on previous work of others.

      Weaknesses:

      (1) The conclusions might not be as strong as presented. First of all, while ancient amino acids interact less frequently in late with a given coenzyme, maybe this just reflects the fact that proteins that evolved later might be using residues that have a more favorable binding free energy.

      We would like to point out that there was no distinction to proteins that evolved early or late in our dataset of coenzyme-binding proteins. The aim of our analysis was purely to observe trends in the age of amino acids vs. age of coenzymes. While no direct inference can be made from this about early life as all the proteins are from extant life (as highlighted in the discussion of our work), our goal was to look for intrinsic propensities of early vs. late amino acids in binding to the different coenzyme entities. Indeed, very early interactions would be smeared by the eons of evolutionary history (perhaps also towards more favourable binding free energy, as pointed out also by the reviewer). Nevertheless, significant trends have been recorded across the PDB dataset, pointing to different propensities and mechanistic properties of the binding events. Rather than to a specific evolutionary past, our data therefore point to a “capacity” of the early amino acids to bind certain coenzymes and we believe that this is the major (and standing) conclusion of our work, along with the properties of such interactions. In our revised version, we will carefully go through all the conclusions and make sure that this message stands out but we are confident that the following concluding sentences copied from the abstract and the discussion of our manuscript fully comply with our data:

      “These results imply the plausibility of a coenzyme-peptide functional collaboration preceding the establishment of the Central Dogma and full protein alphabet evolution”

      “While no direct inferences about distant evolutionary past can be drawn from the analysis of extant proteins, the principles guiding these interactions can imply their potential prebiotic feasibility and significance.”

      “This implies that late amino acids would not be necessarily needed for the sovereignty of coenzyme-peptide interplay.”

      We would also like to add that proteins that evolved later might not always have higher free energy of binding. Musil et al., 2021 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8294521/) showed in their study on the example of haloalkane dehalogenase Dha A that the ancestral sequence reconstruction is a powerful tool for designing more stable, but also more active proteins. Ancestral sequence reconstruction relies on finding ancient states of protein families to suggest mutations that will lead to more stable proteins than are currently existing proteins. Their study did not explore the ligand-protein interactions specifically, but showed that ancient states often show more favourable properties than modern proteins.

      (2) What about other small molecules that existed in the probiotic soup? Do they also prefer such ancient amino acids? If so, this might reflect the interaction propensity of specific amino acids rather than the inferred important role of coenzymes.

      We appreciate the comment of the reviewer towards other small molecules, which we assume points mainly towards metal ions (i.e. inorganic cofactors). We completely agree with the reviewer that such interactions are of utmost importance to the origins of life. Intentionally, they were not part of our study, as these have already been studied previously by others (e.g. Bromberg et al., 2022; and reviewed in Frenkel-Pinter et al., 2020) and also us (Fried et al., 2022). For example, it is noteworthy that prebiotically relevant metal binding sites (e.g. of Mg2+) exhibit enrichment in early amino acids such as Asp and Glu while more recent metal (e.g. Cu and Zn) site in the late amino acids His and Cys (Fried et al., 2022). At the same time, comparable analyses of amino acid - coenzyme trends were not available.

      Nevertheless, involvement of metal ions in the coenzyme binding sites was also studied here and pointed to their bigger involvement with the Ancient coenzymes. In the revised version of the manuscript, we will be happy to enlarge the discussion of the studies concerning inorganic cofactors.

      (3) Perhaps the conclusions just reflect the types of active sites that evolved first and nothing more.

      We partly agree on this point with the reviewer but not on the fact why it is listed as the weakness of our study and on the “nothing more” notion. Understanding what the properties of the earliest binding sites is key to merging the gap between prebiotic chemistry and biochemistry. The potential of peptides preceding ribosomal synthesis (and the full alphabet evolution) along with prebiotically plausible coenzymes addresses exactly this gap, which is currently not understood.

      Reviewer #2 (Public Review):

      I enjoyed reading this paper and appreciate the careful analysis performed by the investigators examining whether 'ancient' cofactors are preferentially bound by the first-available amino acids, and whether later 'LUCA' cofactors are bound by the late-arriving amino acids. I've always found this question fascinating as there is a contradiction in inorganic metal-protein complexes (not what is focused on here). Metal coordination of Fe, Ni heavily relies on softer ligands like His and Cys - which are by most models latecomer amino acids. There are no traces of thiols or imidazoles in meteorites - although work by Dvorkin has indicated that could very well be due to acid degradation during extraction. Chris Dupont (PNAS 2005) showed that metal speciation in the early earth (such as proposed by Anbar and prior RJP Williams) matched the purported order of fold emergence.

      As such, cofactor-protein interactions as a driving force for evolution has always made sense to me and I admittedly read this paper biased in its favor. But to make sure, I started to play around with the data that the authors kindly and importantly shared in the supplementary files. Here's what I found:

      Point 1: The correlation between abundance of amino acids and protein age is dominated by glycine. There is a small, but visible difference in old vs new amino acid fractional abundance between Ancient and LUCA proteins (Figure 3, Supplementary Table 3). However, the bias is not evenly distributed among the amino acids - which Figure 4A shows but is hard to digest as presented. So instead I used the spreadsheet in Supplement 3 to calculate the fractional difference FDaa = F(old aa)-F(new aa). As expected from Figure 3, the mean FD for Ancient is greater than the mean FD for LUCA. But when you look at the same table for each amino acid FDcofactor = F(ancient cofactor) - F(LUCA cofactor), you now see that the bias is not evenly distributed between older and newer amino acids at all. In fact, most of the difference can be explained by glycine (FDcofactor = 3.8) and the rest by also including tryptophan (FDcofactor = -3.8). If you remove these two amino acids from the analysis, the trend seen in Figure 3 all but disappears.

      Troubling - so you might argue that Gly is the oldest of the old and Trp is the newest of the new so the argument still stands. Unfortunately, Gly is a lot of things - flexible, small, polar - so what is the real correlation, age, or chemistry? This leads to point 2.

      We truly acknowledge the effort that the reviewer made in the revision of the data and for the thoughtful, deeper analysis. We agree that this deserves further discussion of our data. As invited by the reviewer, we indeed repeated the analysis on the whole dataset. First, we would like to point out that the reviewer was most probably referring to the Supplementary Fig. 2 (and not 3, which concerns protein folds). While the difference between Ancient and LUCA coenzyme binding is indeed most pronounced for Gly and Trp, we failed to confirm that the trend disappears if those two amino acids are removed from the analysis (additional FDcofactors of 3.2 and -3.2 are observed for the early and late amino acids, resp.), as seen in Table I below. The main additional contributors to this effect are Asp (FD of 2.1) and Ser (FD of 1.8) from the early amino acids and Arg (FD of -2.6) and Cys (FD of -1.7) of the late amino acids. Hence, while we agree with the reviewer that Gly and Trp (the oldest and the youngest) contribute to this effect the most, we disagree that the trend reduces to these two amino acids.

      In addition, the most recent coenzyme temporality (the Post-LUCA) was neglected in the reviewer’s analysis. The difference between F (old) and F (new) is even more pronounced in PostLUCA than in LUCA, vs. Ancient (Table II) and depends much less on Trp. Meanwhile, Asp, Ser, Leu, Phe, and Arg dominate the observed phenomenon (Table I). This further supports our lack of agreement with the reviewer’s point. Nevertheless, we remain grateful for this discussion and we will happily include this additional analysis in the Supplementary Material of our revised manuscript.

      Author response table 1.

      Amino acid fractional difference of all coenzymes at residue level

      Author response table 2.

      Amino acid fractional difference of all coenzymes

      Point 2 - The correlation is dominated by phosphate.

      In the ancient cofactor list, all but 4 comprise at least one phosphate (SAM, tetrahydrofolic acid, biopterin, and heme). Except for SAM, the rest have very low Gly abundance. The overall high Gly abundance in the ancient enzymes is due to the chemical property of glycine that can occupy the right-hand side of the Ramachandran plot. This allows it to make the alternating alphaleftalpharight conformation of the P-loop forming Milner-White's anionic nest. If you remove phosphate binding folds from the analysis the trend in Figure 3 vanishes.

      Likewise, Trp is an important functional residue for binding quinones and tuning its redox potential. The LUCA cofactor set is dominated by quinone and derivatives, which likely drives up the new amino acid score for this class of cofactors.

      Once again, we are thankful to the reviewer for raising this point. The role of Gly in the anionic nests proposed by Milner-White and Russel, as well as the Trp role in quinone binding are important points that we would be happy to highlight more in the discussion of the revised manuscript.<br /> Nevertheless, we disagree that the trends reduce only to the phosphate-containing coenzymes and importantly, that “the trend in Figure 3 vanishes” upon their removal. Table III and IV (below) show the data for coenzymes excluding those with phosphate moiety and the trend in Fig. 3 remains, albeit less pronounced.

      Author response table 3.

      Amino acid fractional difference of non-phosphate containing coenzymes

      Author response table 4.

      Amino acid fractional difference of non-phosphate containing coenzymes at residue level

      In summary, while I still believe the premise that cofactors drove the shape of peptides and the folds that came from them - and that Rossmann folds are ancient phosphate-binding proteins, this analysis does not really bring anything new to these ideas that have already been stated by Tawfik/Longo, Milner-White/Russell, and many others.

      I did this analysis ad hoc on a slice of the data the authors provided and could easily have missed something and I encourage the authors to check my work. If it holds up it should be noted that negative results can often be as informative as strong positive ones. I think the signal here is too weak to see in the noise using the current approach.

      We are grateful to the reviewer for encouraging further look at our data. While we hope that the analysis on the whole dataset (listed in Tables I - IV) will change the reviewer’s standpoint on our work, we would still like to comment on the questioned novelty of our results. In fact, the extraordinary works by Tawfik/Longo and Milner-While/Russel (which were cited in our manuscript multiple times) presented one of the motivations for this study. We take the opportunity to copy the part of our discussion that specifically highlights the relevance of their studies, and points out the contribution of our work with respect to theirs.

      “While all the coenzymes bind preferentially to protein residue sidechains, more backbone interactions appear in the ancient coenzyme class when compared to others. This supports an earlier hypothesis that functions of the earliest peptides (possibly of variable compositions and lengths) would be performed with the assistance of the main chain atoms rather than their sidechains (Milner-White and Russel 2011). Longo et al., recently analyzed binding sites of different phosphate-containing ligands which were arguably of high relevance during earliest stages of life, connecting all of today’s core metabolism (Longo et al., 2020 (b)). They observed that unlike the evolutionary younger binding motifs (which rely on sidechain binding), the most ancient lineages indeed bind to phosphate moieties predominantly via the protein backbone. Our analysis assigns this phenomenon primarily to interactions via early amino acids that (as mentioned above) are generally enriched in the binding interface of the ancient coenzymes. This implies that late amino acids would not be necessarily needed for the sovereignty of coenzymepeptide interplay.”

      Unlike any other previous work, our study involves all the major coenzymes (not just the phosphate-containing ones) and is based on their evolutionary age, as well as age of amino acids. It is the first PDB-wide systematic evolutionary analysis of coenzyme-amino acid binding. Besides confirming some earlier theoretical assertions (such as role of backbone interactions in early peptide-coenzyme evolution) and observations (such as occurrence of the ancient phosphatecontaining coenzymes in the oldest protein folds), it uncovers substantial novel knowledge. For example, (i) enrichment of early amino acids in the binding of ancient coenzymes, vs. enrichment of late amino acids in the binding of LUCA and Post-LUCA coenzymes, (ii) the trends in secondary structure content of the binding sites of coenzyme of different temporalities, (iii) increased involvement of metal ions in the ancient coenzyme binding events, and (iv) the capacity of only early amino acids to bind ancient coenzymes. In our humble opinion, all of these points bring important contributions in the peptide-coenzyme knowledge gap which has been discussed in a number of previous studies.

    1. Author response:

      eLife assessment

      This potentially useful study involves neuro-imaging and electrophysiology in a small cohort of congenital cataract patients after sight recovery and age-matched control participants with normal sight. It aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in the visual cortex. While the findings are taken to suggest the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, the evidence supporting these claims is incomplete. Specifically, small sample sizes, lack of a specific control cohort, and other methodological limitations will likely restrict the usefulness of the work, with relevance limited to scientists working in this particular subfield.

      As pointed out in the public reviews, there are only very few human models which allow for assessing the role of early experience on neural circuit development. While the prevalent research in permanent congenital blindness reveals the response and adaptation of the developing brain to an atypical situation (blindness), research in sight restoration addresses the question of whether and how atypical development can be remediated if typical experience (vision) is restored. The literature on the role of visual experience in the development of E/I balance in humans, assessed via Magnetic Resonance Spectroscopy (MRS), has been limited to a few studies on congenital permanent blindness. Thus, we assessed sight recovery individuals with a history of congenital blindness, as limited evidence from other researchers indicated that the visual cortex E/I ratio might differ compared to normally sighted controls.

      Individuals with total bilateral congenital cataracts who remained untreated until later in life are extremely rare, particularly if only carefully diagnosed patients are included in a study sample. A sample size of 10 patients is, at the very least, typical of past studies in this population, even for exclusively behavioral assessments. In the present study, in addition to behavioral assessment as an indirect measure of sensitive periods, we investigated participants with two neuroimaging methods (Magnetic Resonance Spectroscopy and electroencephalography) to directly assess the neural correlates of sensitive periods in humans. The electroencephalography data allowed us to link the results of our small sample to findings documented in large cohorts of both, sight recovery individuals and permanently congenitally blind individuals. As pointed out in a recent editorial recommending an “exploration-then-estimation procedure,” (“Consideration of Sample Size in Neuroscience Studies,” 2020), exploratory studies like ours provide crucial direction and specific hypotheses for future work.

      We included an age-matched sighted control group recruited from the same community, measured in the same scanner and laboratory, to assess whether early experience is necessary for a typical excitatory/inhibitory (E/I) ratio to emerge in adulthood. The present findings indicate that this is indeed the case. Based on these results, a possible question to answer in future work, with individuals who had developmental cataracts, is whether later visual deprivation causes similar effects. Note that even if visual deprivation at a later stage in life caused similar effects, the current results would not be invalidated; by contrast, they are essential to understand future work on late (permanent or transient) blindness.

      Thus, we think that the present manuscript has far reaching implications for our understanding of the conditions under which E/I balance, a crucial characteristic of brain functioning, emerges in humans.

      Finally, our manuscript is one of the first few studies which relates MRS neurotransmitter concentrations to parameters of EEG aperiodic activity. Since present research has been using aperiodic activity as a correlate of the E/I ratio, and partially of higher cognitive functions, we think that our manuscript additionally contributes to a better understanding of what might be measured with aperiodic neurophysiological activity.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this human neuroimaging and electrophysiology study, the authors aimed to characterize the effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight.

      First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of the group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then performed multiple exploratory correlations between MRS measures and visual acuity, and reported a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants.

      The same participants then took part in an EEG experiment. The authors selected only two electrodes placed in the visual cortex for analysis and reported a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio, and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for a higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel.

      Strengths of study:

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well-written.

      Limitations:

      (1.1) Low sample size. Ten for CC and ten for SC, and a further two SC participants were rejected due to a lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      Applying strict criteria, we only included individuals who were born with no patterned vision in the CC group. The population of individuals who have remained untreated past infancy is small in India, despite a higher prevalence of childhood cataract than Germany. Indeed, from the original 11 CC and 11 SC participants tested, one participant each from the CC and SC group had to be rejected, as their data had been corrupted, resulting in 10 participants in each group.

      It was a challenge to recruit participants from this rare group with no history of neurological diagnosis/intake of neuromodulatory medications, who were able and willing to undergo both MRS and EEG. For this study, data collection took more than 1.5 years.

      We took care of the validity of our results with two measures; first, assessed not just MRS, but additionally, EEG measures of E/I ratio. The latter allowed us to link results to a larger population of CC individuals, that is, we replicated the results of a larger group of 38 individuals (Ossandón et al., 2023) in our sub-group.

      Second, we included a control voxel. As predicted, all group effects were restricted to the occipital voxel.

      (1.2) Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      The existing work on visual deprivation and neurochemical changes, as assessed with MRS, has been limited to permanent congenital blindness. In fact, most of the studies on permanent blindness included only congenitally blind or early blind humans (Coullon et al., 2015; Weaver et al., 2013), or, in separate studies, only late-blind individuals (Bernabeu et al., 2009). Thus, accordingly, we started with the most “extreme” visual deprivation model, sight recovery after congenital blindness. If we had not observed any group difference compared to normally sighted controls, investigating other groups might have been trivial. Based on our results, subsequent studies in late blind individuals, and then individuals with developmental cataracts, can be planned with clear hypotheses.

      (1.3) MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      Worse data quality in the frontal than the visual cortex has been repeatedly observed in the MRS literature, attributable to magnetic field distortions (Juchem & Graaf, 2017) resulting from the proximity of the region to the sinuses (recent example: (Rideaux et al., 2022)). Nevertheless, we chose the frontal control region rather than a parietal voxel, given the potential  neurochemical changes in multisensory regions of the parietal cortex due to blindness. Such reorganization would be less likely in frontal areas associated with higher cognitive functions. Further, prior MRS studies of the visual cortex have used the frontal cortex as a control region as well (Pitchaimuthu et al., 2017; Rideaux et al., 2022).

      In the present study, we checked that the frontal cortex datasets for Glx and GABA+ concentrations were of sufficient quality: the fit error was below 8.31% in both groups (Supplementary Material S3). For reference, Mikkelsen et al. reported a mean GABA+ fit error of 6.24 +/- 1.95% from a posterior cingulate cortex voxel across 8 GE scanners, using the Gannet pipeline. No absolute cutoffs have been proposed for fit errors. However, MRS studies in special populations (I/E ratio assessed in narcolepsy (Gao et al., 2024), GABA concentration assessed in Autism Spectrum Disorder (Maier et al., 2022)) have used frontal cortex data with a fit error of <10% to identify differences between cohorts (Gao et al., 2024; Pitchaimuthu et al., 2017). Based on the literature, MRS data from the frontal voxel of the present study would have been of sufficient quality to uncover group differences.

      In the revised manuscript, we will add the recently published MRS quality assessment form to the supplementary materials. Additionally, we would like to allude to our apriori prediction of group differences for the visual cortex, but not for the frontal cortex voxel.

      (1.4) Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drive the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience-dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised due to congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      Indeed, higher inhibition was not predicted, which we attempt to reconcile in our discussion section. We base our discussion mainly on the non-human animal literature, which has shown evidence of homeostatic changes after prolonged visual deprivation in the adult brain (Barnes et al., 2015). It is also interesting to note that after monocular deprivation in adult humans, resting GABA+ levels decreased in the visual cortex (Lunghi et al., 2015). Assuming that after delayed sight restoration, adult neuroplasticity mechanisms must be employed, these studies would predict a “balancing” of the increased excitatory drive following sight restoration by a commensurate increase in inhibition (Keck et al., 2017). Additionally, the EEG results of the present study allowed for speculation regarding the underlying neural mechanisms of an altered E/I ratio. The aperiodic EEG activity suggested higher spontaneous spiking (increased intercept) and increased inhibition (steeper aperiodic slope between 1-20 Hz) in CC vs SC individuals (Ossandón et al., 2023).

      In the revised manuscript, we will more clearly indicate that these speculations are based primarily on non-human animal work, due to the lack of human studies on the subject.

      (1.5) Heterogeneity in the patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      The goal of the present study was to assess whether we would observe changes in E/I ratio after restoring vision at all. We would not have included patients without nystagmus in the CC group of the present study, since it would have been unlikely that they experienced congenital patterned visual deprivation. Amongst diagnosticians, nystagmus or strabismus might not be considered genuine “comorbidities” that emerge in people with congenital cataracts. Rather, these are consequences of congenital visual deprivation, which we employed as diagnostic criteria. Similarly, absorbed lenses are clear signs that cataracts were congenital. As in other models of experience dependent brain development (e.g. the extant literature on congenital permanent blindness, including anophthalmic individuals (Coullon et al., 2015; Weaver et al., 2013), some uncertainty remains regarding whether the (remaining, in our case) abnormalities of the eye, or the blindness they caused, are the factors driving neural changes. In case of people with reversed congenital cataracts, at least the retina is considered to be intact, as they would otherwise not receive cataract removal surgery.

      However, we consider it unlikely that strabismus caused the group differences, because the present study shows group differences in the Glx/GABA+ ratio at rest, regardless of eye opening or eye closure, for which strabismus would have caused distinct effects. By contrast, the link between GABA concentration and, for example, interocular suppression in strabismus, have so far been documented during visual stimulation (Mukerji et al., 2022; Sengpiel et al., 2006), and differed in direction depending on the amblyopic vs. non-amblyopic eye. Further, one MRS study did not find group differences in GABA concentration between the visual cortices of 16 amblyopic individuals and sighted controls (Mukerji et al., 2022), supporting that the differences in Glx/GABA+ concentration which we observed were driven by congenital deprivation, and not amblyopia-associated visual acuity or eye movement differences.  

      In the revised manuscript, we will discuss the inclusion criteria in more detail, and the aforementioned reasons why our data remains interpretable.

      (1.6) Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones were shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, and not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      In the revised manuscript, we will clearly indicate that the exploratory correlation analyses are reported to put forth hypotheses for future studies.

      (1.7) P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlate with age.

      The correlation between chronological age and aperiodic intercept was observed across groups, but the correlation between Glx and the intercept of the aperiodic EEG activity was seen only in the CC group, even though the SC group was matched for age. Thus, such a correlation was very unlikely to  be predominantly driven by an effect of chronological age.

      In the revised manuscript, we will add the linear regressions with age as a covariate included below, for the relationship between aperiodic intercept and Glx concentration in the CC group. 

      a. A linear regression was conducted within the CC group to predict the intercept during visual stimulation, based on age and visual cortex Glx concentration. The results of the regression analysis indicated that the model explained a significant proportion of the variance in the aperiodic intercept, 𝑅2\=0.82_, t_(2,7)=16.1_, 𝑝=0.0024._ Note that the coefficient for age was not significant, 𝛽=0.007, t(7)=0.82, 𝑝=0.439. The regression coefficients and their respective statistics are presented in Author response table 1.

      Author response table 1.

      Regression Analysis Summary for Predicting Aperiodic Intercept (Visual Stimulation) in the CC group

      b. A linear regression was conducted to predict the intercept during eye opening at rest, based on age and visual cortex Glx concentration. The results of the regression analysis indicated that the model explained a significant proportion of the variance in the aperiodic intercept, 𝑅2\=0.842_, t_(2,7)=18.6,  𝑝=0.00159_._ Note that the coefficient for age was not significant, 𝛽=−0.005, t(7)=−0.90, 𝑝=0.400. The regression coefficients and their respective statistics are presented in Author response table 2.

      Author response table 2.

      Regression Analysis Summary for Predicting Aperiodic Intercept (Eyes Open) in the CC group

      c. Given that the Glx coefficient is significant in both models and age does not significantly predict either outcome, it can be concluded that Glx independently predicts the intercept of the aperiodic intercept.

      (1.8) Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones were shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Figure 4. Yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      In the revised manuscript, we will improve the phrasing. We consider the correlation analyses as exploratory due to our sample size and the absence of prior work. However, we did hypothesize that both MRS and EEG markers would concurrently be altered in CC vs SC individuals.

      (1.9) The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

      The aperiodic intercept and slope did not differ between CC and SC individuals for Fp1 and Fp2, suggesting the spatial specificity of the results. In the revised manuscript, we will add this analysis to the supplementary material.

      Author response image 1.

      Aperiodic intercept (top) and slope (bottom) for congenital cataract-reversal (CC, red) and age-matched normally sighted control (SC, blue) individuals. Distributions of these parameters are displayed as violin plots for three conditions; at rest with eyes closed (EC), at rest with eyes open (EO) and during visual stimulation (LU). Aperiodic parameters were calculated across electrodes Fp1 and Fp2. Solid black lines indicate mean values, dotted black lines indicate median values. Coloured lines connect values of individual participants across conditions.

      Further, Glx concentration in the visual cortex did not correlate with the aperiodic intercept in the SC group (Figure 4), suggesting that this relationship was indeed specific to the CC group.

      The data from all electrodes has been analyzed and published in other studies as well (Pant et al., 2023; Ossandón et al., 2023).

      Reviewer #2 (Public Review):

      Summary:

      The manuscript reports non-invasive measures of activity and neurochemical profiles of the visual cortex in congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts. The declared aim of the study is to find out how restoring visual function after several months or years of complete blindness impacts the balance between excitation and inhibition in the visual cortex.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      (2.1) The main issue is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested an increased excitation/Inhibition ratio in the visual cortex of congenitally blind patients; the present study reports a decreased E/I ratio instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      Longitudinal studies would indeed be the best way to test the hypothesis that the lower E/I ratio in the CC group observed by the present study is a consequence of sight restoration. However, longitudinal studies involving neuroimaging are an effortful challenge, particularly in research conducted outside of major developed countries and dedicated neuroimaging research facilities. Crucially, however, had CC and SC individuals, as well as permanently congenitally blind vs SC individuals (Coullon et al., 2015; Weaver et al., 2013), not differed on any neurochemical markers, such a longitudinal study might have been trivial. Thus, in order to justify and better tailor longitudinal studies, cross-sectional studies are an initial step.

      (2.2) MR Spectroscopy shows a reduced GLX/GABA ratio in patients vs. sighted controls; however, this finding remains rather isolated, not corroborated by other observations. The difference between patients and controls only emerges for the GLX/GABA ratio, but there is no accompanying difference in either the GLX or the GABA concentrations. There is an attempt to relate the MRS data with acuity measurements and electrophysiological indices, but the explorative correlational analyses do not help to build a coherent picture. A bland correlation between GLX/GABA and visual impairment is reported, but this is specific to the patients' group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - the opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patient group.

      We interpret these findings differently, that is, in the context of experiments from non-human animals and the larger MRS literature.

      Homeostatic control of E/I balance assumes that the ratio of excitation (reflected here by Glx) and inhibition (reflected here by GABA+) is regulated. Like prior work (Gao et al., 2024, 2024; Narayan et al., 2022; Perica et al., 2022; Steel et al., 2020; Takado et al., 2022; Takei et al., 2016), we assumed that the ratio of Glx/GABA+ is indicative of E/I balance rather than solely the individual neurotransmitter levels. One of the motivations for assessing the ratio vs the absolute concentration is that as per the underlying E/I balance hypothesis, a change in excitation would cause a concomitant change in inhibition, and vice versa, which has been shown in non-human animal work (Fang et al., 2021; Haider et al., 2006; Tao & Poo, 2005) and modeling research (Vreeswijk & Sompolinsky, 1996; Wu et al., 2022). Importantly, our interpretation of the lower E/I ratio is not just from the Glx/GABA+ ratio, but additionally, based on the steeper EEG aperiodic slope (1-20 Hz).  

      As in the discussion section and response 1.4, we did not expect to see a lower Glx/GABA+ ratio in CC individuals. We discuss the possible reasons for the direction of the correlation with visual acuity and aperiodic offset during passive visual stimulation, and offer interpretations and (testable) hypotheses.

      We interpret the direction of the  Glx/GABA+ correlation with visual acuity to imply that patients with highest (compensatory) balancing of the consequences of congenital blindness (hyperexcitation), in light of visual stimulation, are those who recover best. Note, the sighted control group was selected based on their “normal” vision. Thus, clinical visual acuity measures are not expected to sufficiently vary, nor have the resolution to show strong correlations with neurophysiological measures. By contrast, the CC group comprised patients highly varying in visual outcomes, and thus were ideal to investigate such correlations.

      This holds for the correlation between Glx and the aperiodic intercept, as well. Previous work has suggested that the intercept of the aperiodic activity is associated with broadband spiking activity in neural circuits (Manning et al., 2009). Thus, an atypical increase of spiking activity during visual stimulation, as indirectly suggested by “old” non-human primate work on visual deprivation (Hyvärinen et al., 1981) might drive a correlation not observed in healthy populations.

      In the revised manuscript, we will more clearly indicate in the discussion that these are possible post-hoc interpretations. We argue that given the lack of such studies in humans, it is all the more important that extant data be presented completely, even if the direction of the effects are not as expected.

      (2.3) For these reasons, the reported findings do not allow us to draw firm conclusions on the relation between EEG parameters and E/I ratio or on the impact of early (vs. late) visual experience on the excitation/inhibition ratio of the human visual cortex.

      Indeed, the correlations we have tested between the E/I ratio and EEG parameters were exploratory, and have been reported as such. The goal of our study was not to compare the effects of early vs. late visual experience. The goal was to study whether early visual experience is necessary for a typical E/I ratio in visual neural circuits. We provided clear evidence in favor of this hypothesis. Thus, the present results suggest the necessity of investigating the effects of late visual deprivation. In fact, such research is missing in permanent blindness as well.

      Reviewer #3 (Public Review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods. I have several major concerns in terms of methodological and statistical approaches along with the (over)interpretation of the results. These major concerns are detailed below.

      (3.1) Variability in visual deprivation:

      - The document states a large variability in the duration of visual deprivation (probably also the age at restoration), with significant implications for the sensitivity period's impact on visual circuit development. The variability and its potential effects on the outcomes need thorough exploration and discussion.

      We work with a rare, unique patient population, which makes it difficult to systematically assess the effects of different visual histories while maintaining stringent inclusion criteria such as complete patterned visual deprivation at birth. Regardless, we considered the large variance in age at surgery and time since surgery as supportive of our interpretation: group differences were found despite the large variance in duration of visual deprivation. Moreover, the existing variance was used to explore possible associations between behavior and neural measures, as well as neurochemical and EEG measures.

      In the revised manuscript, we will detail the advantages and disadvantages of our CC sample, with respect to duration of congenital visual deprivation.

      (3.2) Sample size:

      - The small sample size is a major concern as it may not provide sufficient power to detect subtle effects and/or overestimate significant effects, which then tend not to generalize to new data. One of the biggest drivers of the replication crisis in neuroscience.

      We address the small sample size in our discussion, and make clear that small sample sizes were due to the nature of investigations in special populations. It is worth noting that our EEG results fully align  with those of a larger sample of CC individuals (Ossandón et al., 2023), providing us confidence about their validity and reproducibility. Moreover, our MRS results and correlations of those with EEG parameters were spatially specific to occipital cortex measures, as predicted.

      The main problem with the correlation analyses between MRS and EEG measures is that the sample size is simply too small to conduct such an analysis. Moreover, it is unclear from the methods section that this analysis was only conducted in the patient group (which the reviewer assumed from the plots), and not explained why this was done only in the patient group. I would highly recommend removing these correlation analyses.

      We marked the correlation analyses as exploratory; note that we do not base most of our discussion on the results of these analyses. As indicated by Reviewer 1, reporting them allows for deriving more precise hypothesis for future studies. It has to be noted that we investigate an extremely rare population, tested outside of major developed economies and dedicated neuroimaging research facilities. In addition to being a rare patient group, these individuals come from poor communities. Therefore, we consider it justified to report these correlations as exploratory, providing direction for future research.

      (3.3) Statistical concerns:

      - The statistical analyses, particularly the correlations drawn from a small sample, may not provide reliable estimates (see https://www.sciencedirect.com/science/article/pii/S0092656613000858, which clearly describes this problem).

      It would undoubtedly be better to have a larger sample size. We nonetheless think it is of value to the research community to publish this dataset, since 10 multimodal data sets from a carefully diagnosed, rare population, representing a human model for the effects of early experience on brain development, are quite a lot.  Sample sizes in prior neuroimaging studies in transient blindness have most often ranged from n = 1 to n = 10. They nevertheless provided valuable direction for future research, and integration of results across multiple studies provides scientific insights.  

      Identifying possible group differences was the goal of our study, with the correlations being an exploratory analysis, which we have clearly indicated in the methods, results and discussion.

      - Statistical analyses for the MRS: The authors should consider some additional permutation statistics, which are more suitable for small sample sizes. The current statistical model (2x2) design ANOVA is not ideal for such small sample sizes. Moreover, it is unclear why the condition (EO & EC) was chosen as a predictor and not the brain region (visual & frontal) or neurochemicals. Finally, the authors did not provide any information on the alpha level nor any information on correction for multiple comparisons (in the methods section). Finally, even if the groups are matched w.r.t. age, the time between surgery and measurement, the duration of visual deprivation, (and sex?), these should be included as covariates as it has been shown that these are highly related to the measurements of interest (especially for the EEG measurements) and the age range of the current study is large.

      In our ANOVA models, the neurochemicals were the outcome variables, and the conditions were chosen as predictors based on prior work suggesting that Glx/GABA+ might vary with eye closure (Kurcyus et al., 2018). The study was designed based on a hypothesis of group differences localized to the occipital cortex, due to visual deprivation. The frontal cortex voxel was chosen to indicate whether these differences were spatially specific. Therefore, we conducted separate ANOVAs based on this study design.

      In the revised manuscript, we will add permutation analyses for our outcomes, as well as multiple regression models investigating whether the variance in visual history might have driven these results. Note that in the supplementary materials (S6, S7), we have reported the correlations between visual history metrics and MRS/EEG outcomes.

      The alpha level used for the ANOVA models specified in the methods section was 0.05. The alpha level for the exploratory analyses reported in the main manuscript was 0.008, after correcting for (6) multiple comparisons using the Bonferroni correction, also specified in the methods. Note that the p-values following correction are expressed as multiplied by 6, due to most readers assuming an alpha level of 0.05 (see response regarding large p-values).

      We used a control group matched for age and sex. Moreover, the controls were recruited and tested in the same institutes, using the same setup. We feel that we followed the gold standards for recruiting a healthy control group for a patient group.

      - EEG statistical analyses: The same critique as for the MRS statistical analyses applies to the EEG analysis. In addition: was the 2x3 ANOVA conducted for EO and EC independently? This seems to be inconsistent with the approach in the MRS analyses, in which the authors chose EO & EC as predictors in their 2x2 ANOVA.

      The 2x3 ANOVA was not conducted independently for the eyes open/eyes closed condition, the ANOVA conducted on the EEG metrics was 2x3 because it had group (CC, SC) and condition (eyes open (EO), eyes closed (EC) and visual stimulation (LU)) as predictors.

      - Figure 4: The authors report a p-value of >0.999 with a correlation coefficient of -0.42 with a sample size of 10 subjects. This can't be correct (it should be around: p = 0.22). All statistical analyses should be checked.

      As specified in the methods and figure legend, the reported p values in Figure 4 have been corrected using the Bonferroni correction, and therefore multiplied by the number of comparisons, leading to the seemingly large values.

      Additionally, to check all statistical analyses, we put the manuscript through an independent Statistics Check (Nuijten & Polanin, 2020) (https://michelenuijten.shinyapps.io/statcheck-web/) and will upload the consistency report with the revised supplementary material.

      - Figure 2c. Eyes closed condition: The highest score of the *Glx/GABA ratio seems to be ~3.6. In subplot 2a, there seem to be 3 subjects that show a Glx/GABA ratio score > 3.6. How can this be explained? There is also a discrepancy for the eyes-closed condition.

      The three subjects that show the Glx/GABA+ ratio > 3.6 in subplot 2a are in the SC group, whereas the correlations plotted in figure 2c are only for the CC group, where the highest score is indeed ~3.6.

      (3.4) Interpretation of aperiodic signal:

      - Several recent papers demonstrated that the aperiodic signal measured in EEG or ECoG is related to various important aspects such as age, skull thickness, electrode impedance, as well as cognition. Thus, currently, very little is known about the underlying effects which influence the aperiodic intercept and slope. The entire interpretation of the aperiodic slope as a proxy for E/I is based on a computational model and simulation (as described in the Gao et al. paper).

      Apart from the modeling work from Gao et al., multiple papers which have also been cited which used ECoG, EEG and MEG and showed concomitant changes in aperiodic activity with pharmacological manipulation of the E/I ratio (Colombo et al., 2019; Molina et al., 2020; Muthukumaraswamy & Liley, 2018). Further, several prior studies have interpreted changes in the aperiodic slope as reflective of changes in the E/I ratio, including studies of developmental groups (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Schaworonkow & Voytek, 2021) as well as patient groups (Molina et al., 2020; Ostlund et al., 2021).

      In the revised manuscript, we will cite those studies not already included in the introduction.

      - Especially the aperiodic intercept is a very sensitive measure to many influences (e.g. skull thickness, electrode impedance...). As crucial results (correlation aperiodic intercept and MRS measures) are facing this problem, this needs to be reevaluated. It is safer to make statements on the aperiodic slope than intercept. In theory, some of the potentially confounding measures are available to the authors (e.g. skull thickness can be computed from T1w images; electrode impedances are usually acquired alongside the EEG data) and could be therefore controlled.

      All electrophysiological measures indeed depend on parameters such as skull thickness and electrode impedance. As in the extant literature using neurophysiological measures to compare brain function between patient and control groups, we used a control group matched in age/ sex, recruited in the same region, tested with the same devices, and analyzed with the same analysis pipeline. For example, impedance was kept below 10 kOhm for all subjects. There is no evidence available suggesting that congenital cataracts are associated with changes in skull thickness that would cause the observed pattern of group results. Moreover, we cannot think of how any of the exploratory correlations between neurophysiological measures and MRS measures could be accounted for by a difference e.g. in skull thickness.

      - The authors wrote: "Higher frequencies (such as 20-40 Hz) have been predominantly associated with local circuit activity and feedforward signaling (Bastos et al., 2018; Van Kerkoerle et al., 2014); the increased 20-40 Hz slope may therefore signal increased spontaneous spiking activity in local networks. We speculate that the steeper slope of the aperiodic activity for the lower frequency range (1-20 Hz) in CC individuals reflects the concomitant increase in inhibition." The authors confuse the interpretation of periodic and aperiodic signals. This section refers to the interpretation of the periodic signal (higher frequencies). This interpretation cannot simply be translated to the aperiodic signal (slope).

      Prior work has not always separated the aperiodic and periodic components, making it unclear what might have driven these effects in our data. The interpretation of the higher frequency range was intended to contrast with the interpretations of lower frequency range, in order to speculate as to why the two aperiodic fits might go in differing directions. We will clarify our interpretation in the revised manuscript. Note that Ossandon et al. reported highly similar results (group differences for CC individuals and for permanently congenitally blind humans) for the aperiodic activity between 20-40 Hz and oscillatory activity in the gamma range. We will allude to these findings in the revised manuscript.

      - The authors further wrote: We used the slope of the aperiodic (1/f) component of the EEG spectrum as an estimate of E/I ratio (Gao et al., 2017; Medel et al., 2020; Muthukumaraswamy & Liley, 2018). This is a highly speculative interpretation with very little empirical evidence. These papers were conducted with ECoG data (mostly in animals) and mostly under anesthesia. Thus, these studies only allow an indirect interpretation by what the 1/f slope in EEG measurements is actually influenced.

      Note that Muthukumaraswamy et al. (2018) used different types of pharmacological manipulations and analyzed periodic and aperiodic MEG activity in addition to monkey ECoG (Medel et al., 2020) (now published as (Medel et al., 2023)) compared EEG activity in addition to ECoG data after propofol administration. The interpretation of our results are in line with a number of recent studies in developing (Hill et al., 2022; Schaworonkow & Voytek, 2021) and special populations using EEG. As mentioned above, several prior studies have used the slope of the 1/f component/aperiodic activity as an indirect measure of the E/I ratio (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Molina et al., 2020; Ostlund et al., 2021; Schaworonkow & Voytek, 2021), including studies using scalp-recorded EEG. We will make more clear in the introduction of the revised manuscript that this metric is indirect.

      While a full understanding of aperiodic activity needs to be provided, some convergent ideas have emerged . We think that our results contribute to this enterprise, since our study is, to the best of our knowledge, the first which assessed MRS measured neurotransmitter levels and EEG aperiodic activity.

      (3.5) Problems with EEG preprocessing and analysis:

      - It seems that the authors did not identify bad channels nor address the line noise issue (even a problem if a low pass filter of below-the-line noise was applied).

      As pointed out in the methods and Figure 1, we only analyzed data from two channels, O1 and O2, neither of which were rejected for any participant. Channel rejection was performed for the larger dataset, published elsewhere (Ossandón et al., 2023; Pant et al., 2023).

      In both published works, we did not consider frequency ranges above 40 Hz to avoid any possible contamination with line noise. Here, we focused on activity between 0 and 20 Hz, definitely excluding line noise contaminations. The low pass filter (FIR, 1-45 Hz) guaranteed that any spill-over effects of line noise would be restricted to frequencies just below the upper cutoff frequency.

      Additionally, a prior version of the analysis used the cleanline.m function to remove line noise before filtering, and the group differences remained stable. We will report this analysis in the supplementary version of the revised manuscript. Further, both groups were measured in the same lab, making line noise as an account for the observed group effects highly unlikely. Finally, any of the exploratory MRS-EEG correlations would be hard to explain if the EEG parameters would be contaminated with line noise.

      - What was the percentage of segments that needed to be rejected due to the 120μV criteria? This should be reported specifically for EO & EC and controls and patients.

      The mean percentage of 1 second segments rejected for each resting state condition is below. Mean percentage of 6.25 long segments rejected in each group for the visual stimulation condition are also included, and will be added to the revised manuscript:

      Author response table 3.

      - The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which ranged in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; Vanrullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .

      - "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This will be explicitly stated in the revised manuscript.

      - "We excluded the alpha range (8-14 Hz) for this fit to avoid biasing the results due to documented differences in alpha activity between CC and SC individuals (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023)." This does not really make sense, as the FOOOF algorithm first fits the 1/f slope, for which the alpha activity is not relevant.

      We did not use the FOOOF algorithm/toolbox in this manuscript. As stated in the methods, we used a 1/f fit to the 1-20 Hz spectrum in the log-log space, and subtracted this fit from the original spectrum to obtain the corrected spectrum. Given the pronounced difference in alpha power between groups (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023), we were concerned it might drive differences in the exponent values.  Our analysis pipeline had been adapted from previous publications of our group and other labs (Ossandón et al., 2023; Voytek et al., 2015; Waschke et al., 2017).

      We have conducted the analysis with and without the exclusion of the alpha range, as well as using the FOOOF toolbox both in the 1-20 Hz and 20-40 Hz ranges (Ossandón et al., 2023); The findings of a steeper slope in the 1-20 Hz range as well as lower alpha power in CC vs SC individuals remained stable. In Ossandón et al., the comparison between the piecewise fits and FOOOF fits led the authors to use the former as it outperformed the FOOOF algorithm for their data.

      - The model fits of the 1/f fitting for EO, EC, and both participant groups should be reported.

      In Figure 3 of the manuscript, we depicted the mean spectra and 1/f fits for each group. We will add the fit quality metrics and show individual subjects’ fits in the revised manuscript.

      (3.6) Validity of GABA measurements and results:

      - According the a newer study by the authors of the Gannet toolbox (https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/abs/10.1002/nbm.5076), the reliability and reproducibility of the gamma-aminobutyric acid (GABA) measurement can vary significantly depending on acquisition and modeling parameter. Thus, did the author address these challenges?

      We took care of data quality while acquiring MRS data by ensuring appropriate voxel placement and linewidth prior to scanning. Acquisition as well as modeling parameters were constant for both groups, so they cannot have driven group differences.

      The linked article compares the reproducibility of GABA measurement using Osprey, which was released in 2020 and uses linear combination modeling to fit the peak as opposed to Gannet’s simple peak fitting (Hupfeld et al., 2024). The study finds better test-retest reliability for Osprey compared to Gannet’s method.

      As the present work was conceptualized in 2018, we used Gannet 3.0, which was the state-of-the-art edited spectral analysis toolbox at the time, and still is widely used. In the revised manuscript, we will include a supplementary section reanalyzing the main findings with Osprey.

      - Furthermore, the authors wrote: "We confirmed the within-subject stability of metabolite quantification by testing a subset of the sighted controls (n=6) 2-4 weeks apart. Looking at the supplementary Figure 5 (which would be rather plotted as ICC or Blant-Altman plots), the within-subject stability compared to between-subject variability seems not to be great. Furthermore, I don't think such a small sample size qualifies for a rigorous assessment of stability.

      Indeed, we did not intend to provide a rigorous assessment of within-subject stability. Rather, we aimed to confirm that data quality/concentration ratios did not systematically differ between the same subjects tested longitudinally; driven, for example, by scanner heating or time of day. As with the phantom testing, we attempted to give readers an idea of the quality of the data, as they were collected from a primarily clinical rather than a research site.

      In the revised manuscript we will remove the statement regarding stability, and add the Blant-Altman plot.

      - "Why might an enhanced inhibitory drive, as indicated by the lower Glx/GABA ratio" Is this interpretation really warranted, as the results of the group differences in the Glx/GABA ratio seem to be rather driven by a decreased Glx concentration in CC rather than an increased GABA (see Figure 2).

      We used the Glx/GABA+ ratio as a measure, rather than individual Glx or GABA+ concentration, which did not significantly differ between groups. As detailed in Response 2.2, we think this metric aligns better with an underlying E/I balance hypothesis and has been used in many previous studies (Gao et al., 2024; Liu et al., 2015; Narayan et al., 2022; Perica et al., 2022).

      Our interpretation of an enhanced inhibitory drive additionally comes from the combination of aperiodic EEG (1-20 Hz) and MRS measures, which, when considered together, are consistent with a decreased E/I ratio.

      In the revised manuscript, we will rephrase this sentence accordingly. 

      - Glx concentration predicted the aperiodic intercept in CC individuals' visual cortices during ambient and flickering visual stimulation. Why specifically investigate the Glx concentration, when the paper is about E/I ratio?

      As stated in the methods, we exploratorily assessed the relationship between all MRS parameters (Glx, GABA+ and Glx/GABA+ ratio) with the aperiodic parameters (slope, offset), and corrected for multiple comparisons accordingly. We think this is a worthwhile analysis considering the rarity of the dataset/population (see 1.2, 1.6, 2.1 and reviewer 1’s comments about future hypotheses). We only report the Glx – aperiodic intercept correlation in the main manuscript as it survived correction for multiple comparisons.

      (3.7) Interpretation of the correlation between MRS measurements and EEG aperiodic signal:

      - The authors wrote: "The intercept of the aperiodic activity was highly correlated with the Glx concentration during rest with eyes open and during flickering stimulation (also see Supplementary Material S11). Based on the assumption that the aperiodic intercept reflects broadband firing (Manning et al., 2009; Winawer et al., 2013), this suggests that the Glx concentration might be related to broadband firing in CC individuals during active and passive visual stimulation." These results should not be interpreted (or with very caution) for several reasons (see also problem with influences on aperiodic intercept and small sample size). This is a result of the exploratory analyses of correlating every EEG parameter with every MRS parameter. This requires well-powered replication before any interpretation can be provided. Furthermore and importantly: why should this be specifically only in CC patients, but not in the SC control group?

      We indicate clearly in all parts of the manuscript that these correlations are presented as exploratory. Further, we interpret the Glx-aperiodic offset correlation, and none of the others, as it survived the Bonferroni correction for multiple comparisons. We offer a hypothesis in the discussion section as to why such a correlation might exist in the CC but not the SC group (see response 2.2), and do not speculate further.

      (3.8) Language and presentation:

      - The manuscript requires language improvements and correction of numerous typos. Over-simplifications and unclear statements are present, which could mislead or confuse readers (see also interpretation of aperiodic signal).

      In the revision, we will check that speculations are clearly marked and typos are removed.

      - The authors state that "Together, the present results provide strong evidence for experience-dependent development of the E/I ratio in the human visual cortex, with consequences for behavior." The results of the study do not provide any strong evidence, because of the small sample size and exploratory analyses approach and not accounting for possible confounding factors.

      We disagree with this statement and allude to convergent evidence of both MRS and neurophysiological measures. The latter link to corresponding results observed in a larger sample of CC individuals (Ossandón et al., 2023).

      - "Our results imply a change in neurotransmitter concentrations as a consequence of *restoring* vision following congenital blindness." This is a speculative statement to infer a causal relationship on cross-sectional data.

      As mentioned under 2.1, we conducted a cross-sectional study which might justify future longitudinal work. In order to advance science, new testable hypotheses were put forward at the end of a manuscript.

      In the revised manuscript we will add “might imply” to better indicate the hypothetical character of this idea.

      - In the limitation section, the authors wrote: "The sample size of the present study is relatively high for the rare population , but undoubtedly, overall, rather small." This sentence should be rewritten, as the study is plein underpowered. The further justification "We nevertheless think that our results are valid. Our findings neurochemically (Glx and GABA+ concentration), and anatomically (visual cortex) specific. The MRS parameters varied with parameters of the aperiodic EEG activity and visual acuity. The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) (Ossandón et al., 2023), and effects of chronological age were as expected from the literature." These statements do not provide any validation or justification of small samples. Furthermore, the current data set is a subset of an earlier published paper by the same authors "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided.

      Our intention was not to justify having a small sample, but to justify why we think the results might be valid as they align with/replicate existing literature.

      In the revised manuscript, we will add a figure showing that the EEG results of the 10 subjects considered here correspond to those of the 28 other subjects of Ossandon et al. We will adapt the text accordingly, clearly stating that the pattern of EEG results of the ten subjects reported here replicate those of the 28 additional subjects of Ossandon et al. (2023).

      References

      Barnes, S. J., Sammons, R. P., Jacobsen, R. I., Mackie, J., Keller, G. B., & Keck, T. (2015). Subnetwork-specific homeostatic plasticity in mouse visual cortex in vivo. Neuron, 86(5), 1290–1303. https://doi.org/10.1016/J.NEURON.2015.05.010

      Bernabeu, A., Alfaro, A., García, M., & Fernández, E. (2009). Proton magnetic resonance spectroscopy (1H-MRS) reveals the presence of elevated myo-inositol in the occipital cortex of blind subjects. NeuroImage, 47(4), 1172–1176. https://doi.org/10.1016/j.neuroimage.2009.04.080

      Bottari, D., Troje, N. F., Ley, P., Hense, M., Kekunnaya, R., & Röder, B. (2016). Sight restoration after congenital blindness does not reinstate alpha oscillatory activity in humans. Scientific Reports. https://doi.org/10.1038/srep24683

      Colombo, M. A., Napolitani, M., Boly, M., Gosseries, O., Casarotto, S., Rosanova, M., Brichant, J. F., Boveroux, P., Rex, S., Laureys, S., Massimini, M., Chieregato, A., & Sarasso, S. (2019). The spectral exponent of the resting EEG indexes the presence of consciousness during unresponsiveness induced by propofol, xenon, and ketamine. NeuroImage, 189(September 2018), 631–644. https://doi.org/10.1016/j.neuroimage.2019.01.024

      Consideration of Sample Size in Neuroscience Studies. (2020). Journal of Neuroscience, 40(21), 4076–4077. https://doi.org/10.1523/JNEUROSCI.0866-20.2020

      Coullon, G. S. L., Emir, U. E., Fine, I., Watkins, K. E., & Bridge, H. (2015). Neurochemical changes in the pericalcarine cortex in congenital blindness attributable to bilateral anophthalmia. Journal of Neurophysiology. https://doi.org/10.1152/jn.00567.2015

      Fang, Q., Li, Y. T., Peng, B., Li, Z., Zhang, L. I., & Tao, H. W. (2021). Balanced enhancements of synaptic excitation and inhibition underlie developmental maturation of receptive fields in the mouse visual cortex. Journal of Neuroscience, 41(49), 10065–10079. https://doi.org/10.1523/JNEUROSCI.0442-21.2021

      Favaro, J., Colombo, M. A., Mikulan, E., Sartori, S., Nosadini, M., Pelizza, M. F., Rosanova, M., Sarasso, S., Massimini, M., & Toldo, I. (2023). The maturation of aperiodic EEG activity across development reveals a progressive differentiation of wakefulness from sleep. NeuroImage, 277. https://doi.org/10.1016/J.NEUROIMAGE.2023.120264

      Gao, Y., Liu, Y., Zhao, S., Liu, Y., Zhang, C., Hui, S., Mikkelsen, M., Edden, R. A. E., Meng, X., Yu, B., & Xiao, L. (2024). MRS study on the correlation between frontal GABA+/Glx ratio and abnormal cognitive function in medication-naive patients with narcolepsy. Sleep Medicine, 119, 1–8. https://doi.org/10.1016/j.sleep.2024.04.004

      Haider, B., Duque, A., Hasenstaub, A. R., & McCormick, D. A. (2006). Neocortical network activity in vivo is generated through a dynamic balance of excitation and inhibition. Journal of Neuroscience. https://doi.org/10.1523/JNEUROSCI.5297-05.2006

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A. G., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076. https://doi.org/10.1016/J.DCN.2022.101076

      Hupfeld, K. E., Zöllner, H. J., Hui, S. C. N., Song, Y., Murali-Manohar, S., Yedavalli, V., Oeltzschner, G., Prisciandaro, J. J., & Edden, R. A. E. (2024). Impact of acquisition and modeling parameters on the test–retest reproducibility of edited GABA+. NMR in Biomedicine, 37(4), e5076. https://doi.org/10.1002/nbm.5076

      Hyvärinen, J., Carlson, S., & Hyvärinen, L. (1981). Early visual deprivation alters modality of neuronal responses in area 19 of monkey cortex. Neuroscience Letters, 26(3), 239–243. https://doi.org/10.1016/0304-3940(81)90139-7

      Juchem, C., & Graaf, R. A. de. (2017). B0 magnetic field homogeneity and shimming for in vivo magnetic resonance spectroscopy. Analytical Biochemistry, 529, 17–29. https://doi.org/10.1016/j.ab.2016.06.003

      Keck, T., Hübener, M., & Bonhoeffer, T. (2017). Interactions between synaptic homeostatic mechanisms: An attempt to reconcile BCM theory, synaptic scaling, and changing excitation/inhibition balance. Current Opinion in Neurobiology, 43, 87–93. https://doi.org/10.1016/J.CONB.2017.02.003

      Kurcyus, K., Annac, E., Hanning, N. M., Harris, A. D., Oeltzschner, G., Edden, R., & Riedl, V. (2018). Opposite Dynamics of GABA and Glutamate Levels in the Occipital Cortex during Visual Processing. Journal of Neuroscience, 38(46), 9967–9976. https://doi.org/10.1523/JNEUROSCI.1214-18.2018

      Liu, B., Wang, G., Gao, D., Gao, F., Zhao, B., Qiao, M., Yang, H., Yu, Y., Ren, F., Yang, P., Chen, W., & Rae, C. D. (2015). Alterations of GABA and glutamate-glutamine levels in premenstrual dysphoric disorder: A 3T proton magnetic resonance spectroscopy study. Psychiatry Research - Neuroimaging, 231(1), 64–70. https://doi.org/10.1016/J.PSCYCHRESNS.2014.10.020

      Lunghi, C., Berchicci, M., Morrone, M. C., & Russo, F. D. (2015). Short‐term monocular deprivation alters early components of visual evoked potentials. The Journal of Physiology, 593(19), 4361. https://doi.org/10.1113/JP270950

      Maier, S., Düppers, A. L., Runge, K., Dacko, M., Lange, T., Fangmeier, T., Riedel, A., Ebert, D., Endres, D., Domschke, K., Perlov, E., Nickel, K., & Tebartz van Elst, L. (2022). Increased prefrontal GABA concentrations in adults with autism spectrum disorders. Autism Research, 15(7), 1222–1236. https://doi.org/10.1002/aur.2740

      Manning, J. R., Jacobs, J., Fried, I., & Kahana, M. J. (2009). Broadband shifts in local field potential power spectra are correlated with single-neuron spiking in humans. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 29(43), 13613–13620. https://doi.org/10.1523/JNEUROSCI.2041-09.2009

      McSweeney, M., Morales, S., Valadez, E. A., Buzzell, G. A., Yoder, L., Fifer, W. P., Pini, N., Shuffrey, L. C., Elliott, A. J., Isler, J. R., & Fox, N. A. (2023). Age-related trends in aperiodic EEG activity and alpha oscillations during early- to middle-childhood. NeuroImage, 269, 119925. https://doi.org/10.1016/j.neuroimage.2023.119925

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Medel, V., Irani, M., Ossandón, T., & Boncompte, G. (2020). Complexity and 1/f slope jointly reflect cortical states across different E/I balances. bioRxiv, 2020.09.15.298497. https://doi.org/10.1101/2020.09.15.298497

      Molina, J. L., Voytek, B., Thomas, M. L., Joshi, Y. B., Bhakta, S. G., Talledo, J. A., Swerdlow, N. R., & Light, G. A. (2020). Memantine Effects on Electroencephalographic Measures of Putative Excitatory/Inhibitory Balance in Schizophrenia. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(6), 562–568. https://doi.org/10.1016/j.bpsc.2020.02.004

      Mukerji, A., Byrne, K. N., Yang, E., Levi, D. M., & Silver, M. A. (2022). Visual cortical γ−aminobutyric acid and perceptual suppression in amblyopia. Frontiers in Human Neuroscience, 16. https://doi.org/10.3389/fnhum.2022.949395

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Narayan, G. A., Hill, K. R., Wengler, K., He, X., Wang, J., Yang, J., Parsey, R. V., & DeLorenzo, C. (2022). Does the change in glutamate to GABA ratio correlate with change in depression severity? A randomized, double-blind clinical trial. Molecular Psychiatry, 27(9), 3833—3841. https://doi.org/10.1038/s41380-022-01730-4

      Nuijten, M. B., & Polanin, J. R. (2020). “statcheck”: Automatically detect statistical reporting inconsistencies to increase reproducibility of meta-analyses. Research Synthesis Methods, 11(5), 574–579. https://doi.org/10.1002/jrsm.1408

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Ostlund, B. D., Alperin, B. R., Drew, T., & Karalunas, S. L. (2021). Behavioral and cognitive correlates of the aperiodic (1/f-like) exponent of the EEG power spectrum in adolescents with and without ADHD. Developmental Cognitive Neuroscience, 48, 100931. https://doi.org/10.1016/j.dcn.2021.100931

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Perica, M. I., Calabro, F. J., Larsen, B., Foran, W., Yushmanov, V. E., Hetherington, H., Tervo-Clemmens, B., Moon, C.-H., & Luna, B. (2022). Development of frontal GABA and glutamate supports excitation/inhibition balance from adolescence into adulthood. Progress in Neurobiology, 219, 102370. https://doi.org/10.1016/j.pneurobio.2022.102370

      Pitchaimuthu, K., Wu, Q. Z., Carter, O., Nguyen, B. N., Ahn, S., Egan, G. F., & McKendrick, A. M. (2017). Occipital GABA levels in older adults and their relationship to visual perceptual suppression. Scientific Reports, 7(1). https://doi.org/10.1038/S41598-017-14577-5

      Rideaux, R., Ehrhardt, S. E., Wards, Y., Filmer, H. L., Jin, J., Deelchand, D. K., Marjańska, M., Mattingley, J. B., & Dux, P. E. (2022). On the relationship between GABA+ and glutamate across the brain. NeuroImage, 257, 119273. https://doi.org/10.1016/J.NEUROIMAGE.2022.119273

      Schaworonkow, N., & Voytek, B. (2021). Longitudinal changes in aperiodic and periodic activity in electrophysiological recordings in the first seven months of life. Developmental Cognitive Neuroscience, 47. https://doi.org/10.1016/j.dcn.2020.100895

      Schwenk, J. C. B., VanRullen, R., & Bremmer, F. (2020). Dynamics of Visual Perceptual Echoes Following Short-Term Visual Deprivation. Cerebral Cortex Communications, 1(1). https://doi.org/10.1093/TEXCOM/TGAA012

      Sengpiel, F., Jirmann, K.-U., Vorobyov, V., & Eysel, U. T. (2006). Strabismic Suppression Is Mediated by Inhibitory Interactions in the Primary Visual Cortex. Cerebral Cortex, 16(12), 1750–1758. https://doi.org/10.1093/cercor/bhj110

      Steel, A., Mikkelsen, M., Edden, R. A. E., & Robertson, C. E. (2020). Regional balance between glutamate+glutamine and GABA+ in the resting human brain. NeuroImage, 220. https://doi.org/10.1016/J.NEUROIMAGE.2020.117112

      Takado, Y., Takuwa, H., Sampei, K., Urushihata, T., Takahashi, M., Shimojo, M., Uchida, S., Nitta, N., Shibata, S., Nagashima, K., Ochi, Y., Ono, M., Maeda, J., Tomita, Y., Sahara, N., Near, J., Aoki, I., Shibata, K., & Higuchi, M. (2022). MRS-measured glutamate versus GABA reflects excitatory versus inhibitory neural activities in awake mice. Journal of Cerebral Blood Flow & Metabolism, 42(1), 197. https://doi.org/10.1177/0271678X211045449

      Takei, Y., Fujihara, K., Tagawa, M., Hironaga, N., Near, J., Kasagi, M., Takahashi, Y., Motegi, T., Suzuki, Y., Aoyama, Y., Sakurai, N., Yamaguchi, M., Tobimatsu, S., Ujita, K., Tsushima, Y., Narita, K., & Fukuda, M. (2016). The inhibition/excitation ratio related to task-induced oscillatory modulations during a working memory task: A multtimodal-imaging study using MEG and MRS. NeuroImage, 128, 302–315. https://doi.org/10.1016/J.NEUROIMAGE.2015.12.057

      Tao, H. W., & Poo, M. M. (2005). Activity-dependent matching of excitatory and inhibitory inputs during refinement of visual receptive fields. Neuron, 45(6), 829–836. https://doi.org/10.1016/J.NEURON.2005.01.046

      Vanrullen, R., & MacDonald, J. S. P. (2012). Perceptual echoes at 10 Hz in the human brain. Current Biology. https://doi.org/10.1016/j.cub.2012.03.050

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38). https://doi.org/10.1523/JNEUROSCI.2332-14.2015

      Vreeswijk, C. V., & Sompolinsky, H. (1996). Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science, 274(5293), 1724–1726. https://doi.org/10.1126/SCIENCE.274.5293.1724

      Waschke, L., Wöstmann, M., & Obleser, J. (2017). States and traits of neural irregularity in the age-varying human brain. Scientific Reports 2017 7:1, 7(1), 1–12. https://doi.org/10.1038/s41598-017-17766-4

      Weaver, K. E., Richards, T. L., Saenz, M., Petropoulos, H., & Fine, I. (2013). Neurochemical changes within human early blind occipital cortex. Neuroscience. https://doi.org/10.1016/j.neuroscience.2013.08.004

      Wu, Y. K., Miehl, C., & Gjorgjieva, J. (2022). Regulation of circuit organization and function through inhibitory synaptic plasticity. Trends in Neurosciences, 45(12), 884–898. https://doi.org/10.1016/J.TINS.2022.10.006

    1. Author response:

      Reviewer #1 (Public review):

      (1) Legionella effectors are often activated by binding to eukaryote-specific host factors, including actin. The authors should test the following: a) whether Lfat1 can fatty acylate small G-proteins in vitro; b) whether this activity is dependent on actin binding; and c) whether expression of the Y240A mutant in mammalian cells affects the fatty acylation of Rac3 (Figure 6B), or other small G-proteins.

      We were not able to express and purify the full-length recombinant Lfat1 to perform fatty acylation of small GTPases in vitro. However, in cellulo overexpression of the Y240A mutant still retained ability to fatty acylate Rac3 and another small GTPase RheB (see Author response image 1 below). We postulate that under infection conditions, actin-binding might be required to fatty acylate certain GTPases due to the small amount of effector proteins that secreted into the host cell.

      Author response image 1.

      (2) It should be demonstrated that lysine residues on small G-proteins are indeed targeted by Lfat1. Ideally, the functional consequences of these modifications should also be investigated. For example, does fatty acylation of G-proteins affect GTPase activity or binding to downstream effectors?

      We have mutated K178 on RheB and showed that this mutation abolished its fatty acylation by Lfat1 (see Author response image 2 below). We were not able to test if fatty acylation by Lfat1 affect downstream effector binding.

      Author response image 2.

      (3) Line 138: Can the authors clarify whether the Lfat1 ABD induces bundling of F-actin filaments or promotes actin oligomerization? Does the Lfat1 ABD form multimers that bring multiple filaments together? If Lfat1 induces actin oligomerization, this effect should be experimentally tested and reported. Additionally, the impact of Lfat1 binding on actin filament stability should be assessed. This is particularly important given the proposed use of the ABD as an actin probe.

      The ABD domain does not form oligomer as evidenced by gel filtration profile of the ABD domain. However, we do see F-actin bundling in our in vitro -F-actin polymerization experiment when both actin and ABD are in high concentration (data not shown). Under low concentration of ABD, there is not aggregation/bundling effect of F-actin.

      (4) Line 180: I think it's too premature to refer to the interaction as having "high specificity and affinity." We really don't know what else it's binding to.

      We have revised the text and reworded the sentence by removing "high specificity and affinity."

      (5) The authors should reconsider the color scheme used in the structural figures, particularly in Figures 2D and S4.

      Not sure the comments on the color scheme of the structure figures.

      (6) In Figure 3E, the WT curve fits the data poorly, possibly because the actin concentration exceeds the Kd of the interaction. It might fit better to a quadratic.

      We have performed quadratic fitting and replaced Figure 3E.

      (7) The authors propose that the individual helices of the Lfat1 ABD could be expressed on separate proteins and used to target multi-component biological complexes to F-actin by genetically fusing each component to a split alpha-helix. This is an intriguing idea, but it should be tested as a proof of concept to support its feasibility and potential utility.

      It is a good suggestion. We plan to thoroughly test the feasibility of this idea as one of our future directions.

      (7) The plot in Figure S2D appears cropped on the X-axis or was generated from a ~2× binned map rather than the deposited one (pixel size ~0.83 Å, plot suggests ~1.6 Å). The reported pixel size is inconsistent between the Methods and Table 1-please clarify whether 0.83 Å refers to super-resolution.

      Yes, 0.83 Å is super-resolution. We have updated in the cryoEM table

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The authors should use biochemical reactions to analyze the KFAT of Llfat1 on one or two small GTPases shown to be modified by this effector in cellulo. Such reactions may allow them to determine the role of actin binding in its biochemical activity. This notion is particularly relevant in light of recent studies that actin is a co-factor for the activity of LnaB and Ceg14 (PMID: 39009586; PMID: 38776962; PMID: 40394005). In addition, the study should be discussed in the context of these recent findings on the role of actin in the activity of L. pneumophila effectors.

      We have new data showed that Actin binding does not affect Lfat1 enzymatic activity. (see figure; response to Reviewer #1). We have added this new data as Figure S7 to the paper. Accordingly, we also revised the discussion by adding the following paragraph.

      “The discovery of Lfat1 as an F-actin–binding lysine fatty acyl transferase raised the intriguing question of whether its enzymatic activity depends on F-actin binding. Recent studies have shown that other Legionella effectors, such as LnaB and Ceg14, use actin as a co-factor to regulate their activities. For instance, LnaB binds monomeric G-actin to enhance its phosphoryl-AMPylase activity toward phosphorylated residues, resulting in unique ADPylation modifications in host proteins (Fu et al, 2024; Wang et al, 2024). Similarly, Ceg14 is activated by host actin to convert ATP and dATP into adenosine and deoxyadenosine monophosphate, thereby modulating ATP levels in L. pneumophila–infected cells (He et al, 2025). However, this does not appear to be the case for Lfat1. We found that Lfat1 mutants defective in F-actin binding retained the ability to modify host small GTPases when expressed in cells (Figure S7). These findings suggest that, rather than serving as a co-factor, F-actin may serve to localize Lfat1 via its actin-binding domain (ABD), thereby confining its activity to regions enriched in F-actin and enabling spatial specificity in the modification of host targets.”

      (2) The development of the ABD domain of Llfat1 as an F-actin domain is a nice extension of the biochemical and structural experiments. The authors need to compare the new probe to those currently commonly used ones, such as Lifeact, in labeling of the actin cytoskeleton structure.

      We fully agree with the reviewer’s insightful suggestion. However, a direct comparison of the Lfat1 ABD domain with commonly used actin probes such as Lifeact, as well as evaluation of the split α-helix probe (as suggested by Reviewer #1), would require extensive and technically demanding experiments. These are important directions that we plan to pursue in future studies.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study reveals that TRPV1 signaling plays a key role in tympanic membrane (TM) healing by promoting macrophage recruitment and angiogenesis. Using a mouse TM perforation model, researchers found that blood-derived macrophages accumulated near the wound, driving angiogenesis and repair. TRPV1-expressing nerve fibers triggered neuroinflammatory responses, facilitating macrophage recruitment. Genetic Trpv1 mutation reduced macrophage infiltration, angiogenesis, and delayed healing. These findings suggest that targeting TRPV1 or stimulating sensory nerve fibers could enhance TM repair, improve blood flow, and prevent infections. This offers new therapeutic strategies for TM perforations and otitis media in clinical settings. This is an excellent and high-quality study that provides valuable insights into the mechanisms underlying TM wound healing.

      Strengths:

      The work is particularly important for elucidating the cellular and molecular processes involved in TM repair. However, there are several concerns about the current version.

      We sincerely thank Reviewer #1 for their time and effort in evaluating and improving our study. Below, we are pleased to address the Reviewer's concerns point by point.

      Weaknesses:

      Major concerns

      (1) The method of administration will be a critical factor when considering potential therapeutic strategies to promote TM healing. It would be beneficial if the authors could discuss possible delivery methods, such as topical application, transtympanic injection, or systemic administration, and their respective advantages and limitations for targeting TRPV1 signaling. For example, Dr. Kanemaru and his colleagues have proposed the use of Trafermin and Spongel to regenerate the eardrum.

      We are grateful to the reviewer for raising this important point. While the present study primarily focuses on the mechanistic role of TRPV1 in TM repair, we agree that the mode of therapeutic delivery will be pivotal in translating these findings into clinical practice. In response, we will expand the discussion to explore possible delivery methods—such as topical application, transtympanic injection, and systemic routes—along with their respective benefits and challenges. We will also cite the work by Dr. Kanemaru and colleagues as an example of how local delivery systems may facilitate TM regeneration.

      (2) The authors appear to have used surface imaging techniques to observe the TM. However, the TM consists of three distinct layers: the epithelial layer, the fibrous middle layer, and the inner mucosal layer. The authors should clarify whether the proposed mechanism involving TRPV1-mediated macrophage recruitment and angiogenesis is limited to the epithelial layer or if it extends to the deeper layers of the TM.

      We apologize for any confusion caused by our previous description. In our study, we utilized Z-stack confocal imaging to capture the full thickness of the TM, as illustrated in Author response image 1 (reconstructed from the acquired Z-sections). This imaging technique allowed us to encompass all three layers of the TM entirely. Each sample was imaged using a 10X objective on an Olympus fluorescence microscope. Given the conical shape and size of the TM, we imaged it in four quadrants, acquiring approximately 30 optical sections (with a 3 µm step) per region. Each acquired images were projected and exported using FV10ASW 4.2 Viewer, then stitched together using Photoshop. The resulting Z-stack projections enabled us to visualize the distribution of macrophages, angiogenesis, and the localization of nerve fibers throughout the TM. We will include this detailed methodology in our revision to clarify any potential confusion.

      Author response image 1.

      Representative confocal images showing one quadrant of the TM collected from collected from CSR1F<sup>EGFP</sup> bone marrow transplanted mouse at day 7 post-perforation. (A-B) 3D-rendered views from different angles reveal the close spatial relationship between CSF1R<sup>EGFP</sup> cells (green) and blood vessels (red) within the TM. (C) Cross-sectional view highlights the depth-wise distribution of CSF1R<sup>EGFP</sup> cells (green) and blood vessels (red) across the layered TM architecture. All images were processed using Imaris Viewer x64 (version 10.2.0).

      Minor concerns

      In Figure 8, the schematic illustration presents a coronal section of the TM. However, based on the data provided in the manuscript, it is unclear whether the authors directly obtained coronal images in their study. To enhance the clarity and impact of the schematic, it would be helpful to include representative images of coronal sections showing macrophage infiltration, angiogenesis, and nerve fiber distribution in the TM.

      As noted above, we utilized Z-stack confocal imaging to capture the full thickness of the TM, enabling us to visualize structures across all three layers. This approach ensured that all layers were included in our analysis. Due to the thin and curved nature of the TM, traditional cross-sectional imaging often struggles to clearly depict the spatial relationships between macrophages, blood vessels, and nerve fibers, especially at low magnification as shown in Author response image 2. In response to the reviewer's suggestion, we will include representative coronal images in the revised manuscript to better illustrate the distribution of these structures at higher magnification.

      Author response image 2.

      Confocal images of eardrum cross-sections collected at day 1 (A), 3 (B), and 7 (C) post perforation to demonstrate the wound healing processes.

      Reviewer #2 (Public review):

      Summary:

      This study examines the role of TRPV1 signaling in the recruitment of monocyte-derived macrophages and the promotion of angiogenesis during tympanic membrane (TM) wound healing. The authors use a combination of genetic mouse models, macrophage depletion, and transcriptomic approaches to suggest that neuronal TRPV1 activity contributes to macrophage-driven vascular responses necessary for tissue repair.

      Strengths:

      (1) The topic of neuroimmune interactions in tissue regeneration is of interest and underexplored in the context of the TM, which presents a unique model due to its anatomical features.

      (2) The use of reporter mice and bone marrow chimeras allows for some dissection of immune cell origin.

      (3) The authors incorporate transcriptomic data to contextualize inflammatory and angiogenic processes during wound healing.

      We sincerely thank Reviewer #2 for their time and effort in improving our study and recognizing its strengths. Below, we are pleased to address the reviewer's concerns point by point.

      Weaknesses:

      (1) The primary claims of the manuscript are not convincingly supported by the evidence presented. Most of the data are correlative in nature, and no direct mechanistic experiments are included to establish causality between TRPV1 signaling and macrophage recruitment or function.

      We appreciate Reviewer #2's perspective on the lack of molecular mechanisms linking TRPV1 signaling and macrophages. However, our data demonstrates that TRPV1 mutations significantly affect macrophage recruitment and angiogenesis. This initial study primarily focuses on the intriguing phenomenon of how sensory nerve fibers are involved in eardrum immunity and wound healing, an area that has not been clearly reported in the literature before. We believe that further research is necessary to explore this topic in greater depth.

      (2) Functional validation of key molecular players (such as Tac1 or Spp1) is lacking, and their roles are inferred primarily from gene expression data rather than experimentally tested.

      Although we have identified the TAC1 and SPP1 signals as potentially important for TM wound healing for the first time, we agree with the Reviewer's view regarding the lack of molecular mechanisms explored in this study. We have not yet tested the downstream signaling pathways, but we plan to investigate them in a series of future studies. As this is an early report, we will continue to explore these signals and their potential clinical applications based on our initial findings moving forward.

      (3) The reuse of publicly available scRNA-seq data is not sufficiently integrated or extended to yield new biological insights, and it remains largely descriptive.

      We appreciate Reviewer #2 for highlighting this point. Leveraging publicly available scRNA-seq databases and established analysis pipelines not only saves time and resources—my lab recently collected macrophages from the eardrums of postnatal P15 mice, with each trial requiring 20 eardrums from 10 animals to obtain a sufficient number of cells—but also allows researchers to build on previous work and focus on new biological questions without the need to repeat experiments. A prior study conducted by Dr. Tward and his team utilized scRNA-seq data to make initial discoveries related to eardrum wound healing, primarily focusing on epithelial cells rather than macrophages. We are building on their raw data to uncover new biological insights regarding macrophages, even though we have not yet tested the unidentified signals, which we believe will be valuable to our peers.

      (4) The macrophage depletion model (CX3CR1CreER; iDTR) lacks specificity, and possible off-target or systemic effects are not addressed.

      We agree with reviewer #2, although macrophage depletion model used in our study is a standard and well-used animal model (Shi, Hua et al. 2018), which has been used by many other laboratories, it is important to note that any macrophage depletion model may have potential issues. We will discuss this in our revision.

      (5) Several interpretations of the data appear overstated, particularly regarding the necessity of TRPV1 for monocyte recruitment and wound healing.

      We thank the reviewer for pointing this out. We will revise our manuscript where it is overstated accordingly.

      (6) Overall, the study appears to apply known concepts - namely, TRPV1-mediated neurogenic inflammation and macrophage-driven angiogenesis - to a new anatomical site without providing new mechanistic insight or advancing the field substantially.

      Although our study may not seem highly innovative at first glance, it reveals a previously unknown role of the TRPV1 pain signaling pathway in promoting eardrum healing for the first time. This healing process includes the recruitment of monocyte-derived macrophages and the formation of new blood vessels (angiogenesis). While this process has been documented in other organs, most research on macrophage-driven angiogenesis has been conducted using in vitro models, with very few studies demonstrating this process in vivo. Our findings could lead to new translational opportunities, especially considering that tympanic membrane perforation, along with damage-induced otitis media and conductive hearing loss, are common clinical issues affecting millions of people worldwide. Targeting TRPV1 signaling could enhance tympanic membrane immunity, improve blood circulation, promote the repair of damaged tympanic membranes, and ultimately prevent middle ear infections—an idea that has not been previously proposed.

      Overall:

      While the study addresses an interesting topic, the current version does not provide sufficiently strong or novel evidence to support its major conclusions. Additional mechanistic experiments and more rigorous validation would be necessary to substantiate the proposed model and clarify the relevance of the findings beyond this specific tissue context.

      We greatly thank the two reviewers for their helpful critiques to improve our study. We especially thank the Section Editors for their insightful and constructive comments on this initial study.

      References:

      Shi, J., L. Hua, D. Harmer, P. Li and G. Ren (2018). "Cre Driver Mice Targeting Macrophages." Methods Mol Biol 1784: 263-275.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This article investigates the origin of movement slowdown in weightlessness by testing two possible hypotheses: the first is based on a strategic and conservative slowdown, presented as a scaling of the motion kinematics without altering its profile, while the second is based on the hypothesis of a misestimation of effective mass by the brain due to an alteration of gravity-dependent sensory inputs, which alters the kinematics following a controller parameterization error.

      Strengths:

      The article convincingly demonstrates that trajectories are affected in 0g conditions, as in previous work. It is interesting, and the results appear robust. However, I have two major reservations about the current version of the manuscript that prevent me from endorsing the conclusion in its current form.

      Weaknesses:

      (1) First, the hypothesis of a strategic and conservative slow down implicitly assumes a similar cost function, which cannot be guaranteed, tested, or verified. For example, previous work has suggested that changing the ratio between the state and control weight matrices produced an alteration in movement kinematics similar to that presented here, without changing the estimated mass parameter (Crevecoeur et al., 2010, J Neurophysiol, 104 (3), 1301-1313). Thus, the hypothesis of conservative slowing cannot be rejected. Such a strategy could vary with effective mass (thus showing a statistical effect), but the possibility that the data reflect a combination of both mechanisms (strategic slowing and mass misestimation) remains open.

      We test whether changing the ratio between the state and control weight matrices can generate the observed effect. As shown in Author response image 1 and Author response image 2, the cost function change cannot produce a reduced peak velocity/acceleration and their timing advance simultaneously, but a mass estimation change can. In other words, using mass underestimation alone can explain the two key findings, amplitude reduction and timing advance. Yes, we cannot exclude the possibility of a change in cost function on top of the mass underestimation, but the principle of Occam’s Razor would support to adhering to a simple explanation, i.e., using body mass underestimation to explain the key findings. We will include our exploration on possible changes in cost function in the revision (in the Supplemental Materials).

      Author response image 1.

      Simulation using an altered cost function with α = 3.0. Panels A, B, and E show simulated position, velocity, and acceleration profiles, respectively, for the three movement directions. Solid lines correspond to pre- and post-exposure conditions, while dashed lines represent the in-flight condition. Panels C and D display the peak velocity and its timing across the three phases (Pre, In, Post), and Panels F and G show the corresponding peak acceleration and its timing. Note, varying the cost function, while leading to reduced peak velocity/acceleration, leads to an erroneous prediction of delayed timing of peak velocity/acceleration.

      Author response image 2.

      Simulation results using a cost function with α = 0.3. The format is the same as in Author response image 1. Note, this ten-fold decrease in α, while finally getting the timing of peak velocity/acceleration right (advanced or reduced), leads to an erroneous prediction of increased peak velocity/acceleration.

      (2) The main strength of the article is the presence of directional effects expected under the hypothesis of mass estimation error. However, the article lacks a clear demonstration of such an effect: indeed, although there appears to be a significant effect of direction, I was not sure that this effect matched the model's predictions. A directional effect is not sufficient because the model makes clear quantitative predictions about how this effect should vary across directions. In the absence of a quantitative match between the model and the data, the authors' claims regarding the role of misestimating the effective mass remain unsupported.

      Our paper does not aim to quantitatively reproduce human reaching movements in microgravity. We will make this more clearly in the revision.

      (1) The model is a simplification of the actual situation. For example, the model simulates an ideal case of moving a point mass (effective mass) without friction and without considering Coriolis and centripetal torques, while the actual situation is that people move their finger across a touch screen. The two-link arm model assumes planar movements, but our participants move their hand on a table top without vertical support to constrain their movement in 2D.

      (2) Our study merely uses well-established (though simplified) models to qualitatively predict the overall behavioral patterns if mass underestimation is at play. For this purpose, the results are well in line with models’ qualitative predictions: we indeed confirm that key kinematic features (peak velocity and acceleration) follow the same ranking order of movement direction conditions as predicted.

      (3) Using model simulation to qualitatively predict human behavioral patterns is a common practice in motor control studies, prominent examples including the papers on optimal feedback control (Todorov, 2004 and 2005) and movement vigor (Shadmehr et al., 2016). In fact, our model was inspired by the model in the latter paper.

      Citations:

      Todorov, E. (2004). Optimality principles in sensorimotor control. Nature Neuroscience, 7(9), 907.

      Todorov, E. (2005). Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Computation, 17(5), 1084–1108.

      Shadmehr, R., Huang, H. J., & Ahmed, A. A. (2016). A Representation of Effort in Decision-Making and Motor Control. Current Biology: CB, 26(14), 1929–1934.

      In general, both the hypotheses of slowing motion (out of caution) and misestimating mass have been put forward in the past, and the added value of this article lies in demonstrating that the effect depended on direction. However, (1) a conservative strategy with a different cost function can also explain the data, and (2) the quantitative match between the directional effect and the model's predictions has not been established.

      Specific points:

      (1) I noted a lack of presentation of raw kinematic traces, which would be necessary to convince me that the directional effect was related to effective mass as stated.

      We are happy to include exemplary speed and acceleration trajectories. One example subject’s detailed trajectories are shown below and will be included in the revision. The reduced and advanced velocity/acceleration peaks are visible in typical trials.

      Author response image 3.

      Hand speed profiles (upper panels), hand acceleration profiles (middle panels) and speed profiles of the primary submovements (lower panels) towards different directions from an example participant.

      (2) The presentation and justification of the model require substantial improvement; the reason for their presence in the supplementary material is unclear, as there is space to present the modelling work in detail in the main text. Regarding the model, some choices require justification: for example, why did the authors ignore the nonlinear Coriolis and centripetal terms?

      Response: In brief, our simulations show that Coriolis and centripetal forces, despite having some directional anisotropy, only have small effects on predicted kinematics (see our responses to Reviewer 2). We will move descriptions of the model into the main text with more justifications for using a simple model.

      (3) The increase in the proportion of trials with subcomponents is interesting, but the explanatory power of this observation is limited, as the initial percentage was already quite high (from 60-70% during the initial study to 70-85% in flight). This suggests that the potential effect of effective mass only explains a small increase in a trend already present in the initial study. A more critical assessment of this result is warranted.

      Response: Indeed, the percentage of submovements only increases slightly, but the more important change is that the IPI (the inter-peak interval between submovements) also increases at the same time. Moreover, it is the effect of IPI that significantly predicts the duration increase in our linear mixed model. We will highlight this fact in our revision to avoid confusion.

      Reviewer #2 (Public review):

      This study explores the underlying causes of the generalized movement slowness observed in astronauts in weightlessness compared to their performance on Earth. The authors argue that this movement slowness stems from an underestimation of mass rather than a deliberate reduction in speed for enhanced stability and safety.

      Overall, this is a fascinating and well-written work. The kinematic analysis is thorough and comprehensive. The design of the study is solid, the collected dataset is rare, and the model tends to add confidence to the proposed conclusions. That being said, I have several comments that could be addressed to consolidate interpretations and improve clarity.

      Main comments:

      (1) Mass underestimation

      a) While this interpretation is supported by data and analyses, it is not clear whether this gives a complete picture of the underlying phenomena. The two hypotheses (i.e., mass underestimation vs deliberate speed reduction) can only be distinguished in terms of velocity/acceleration patterns, which should display specific changes during the flight with a mass underestimation. The experimental data generally shows the expected changes but for the 45{degree sign} condition, no changes are observed during flight compared to the pre- and post-phases (Figure 4). In Figure 5E, only a change in the primary submovement peak velocity is observed for 45{degree sign}, but this finding relies on a more involved decomposition procedure. It suggests that there is something specific about 45{degree sign} (beyond its low effective mass). In such planar movements, 45{degree sign} often corresponds to a movement which is close to single-joint, whereas 90{degree sign} and 135{degree sign} involve multi-joint movements. If so, the increased proportion of submovements in 90{degree sign} and 135{degree sign} could indicate that participants had more difficulties in coordinating multi-joint movements during flight. Besides inertia, Coriolis and centripetal effects may be non-negligible in such fast planar reaching (Hollerbach & Flash, Biol Cyber, 1982) and, interestingly, they would also be affected by a mass underestimation (thus, this is not necessarily incompatible with the author's view; yet predicting the effects of a mass underestimation on Coriolis/centripetal torques would require a two-link arm model). Overall, I found the discrepancy between the 45{degree sign} direction and the other directions under-exploited in the current version of the article. In sum, could the corrective submovements be due to a misestimation of Coriolis/centripetal torques in the multi-joint dynamics (caused specifically -or not- by a mass underestimation)?

      We agree that the effect of mass underestimation is less in the 45° direction than the other two directions, possibly related to its reliance on single-joint (elbow) as opposed to two-joints (elbow and shoulder) movements. Plus, movement correction using one joint is probably easier (as also suggested by another reviewer), this possibility will be further discussed in the revision. However, we find that our model simplification (excluding Coriolis and centripetal torques) does not affect our main conclusions at all. First, we performed a simple simulation and found that, under the current optimal hand trajectory, incorporating Coriolis and centripetal torques has only a limited impact on the resulting joint torques (see simulations in Author response image 4). One reason is that we used smaller movements than Hallerbach & Flash did. In addition, we applied an optimal feedback control model to a more realistic 2-joint arm configuration. Despite its simplicity, this model produced a speed profile consistent with our current predictions and made similar predictions regarding the effects of mass underestimation (Author response image 5). We will provide a more realistic 2-joint arm model muscle dynamics in the revision to improve the simulation further, but the message will be same: including or excluding Coriolis and centripetal torques will not affect the theoretical predictions about mass underestimation. Second, as the reviewer correctly pointed out, the mass (and its underestimation) also affects these two torque terms, thus its effect on kinematic measures is not affected much even with the full model.

      Author response image 4.

      Joint angles and joint torque of shoulder and elbow with simulated trajectories towards different directions. A. Shoulder (green) and elbow (blue) angles change with time for the 45° movement direction. B. Components of joint interaction torques at the shoulder. Solid line: net torque at the shoulder; dotted line: shoulder inertia torque; dashed line: shoulder Coriolis and centripetal torque. C. Same plot as B for the elbow joint. D–F. Coriolis and centripetal components in the full 360° workspace, beyond three movement directions (45°, 90°, and 135°). D. Net torque. E. Inertial torque. F. Combined Coriolis and centripetal torque. Note the polar plots of Coriolis/centripetal torques (F) have a scale that is two magnitudes smaller than that of inertial torque in our simulation. All torques were simulated with the optimal movement duration. Torques were squared and integrated over each trajectory.

      Author response image 5.

      Comparison between simulation results from the full model with the addition of Coriolis/centripetal torques (left) and the simplified model (right). The position profiles (top) and the corresponding speed profiles low) are shown. Solid lines are for normal mass estimation and dashed lines for mass underestimation in microgravity. The three colors represent three movement directions (dark red: 45°, red: 90°, yellow: 135°). The full model used a 2-link arm model without realistic muscle dynamics yet (will include in the formal revision) thus the speed profile is not smooth. Importantly, the full model also predict the same effect of mass underestimation, i.e., reduced peak velocity/acceleration and their timing advance.

      b) Additionally, since the taikonauts are tested after 2 or 3 weeks in flight, one could also assume that neuromuscular deconditioning explains (at least in part) the general decrease in movement speed. Can the authors explain how to rule out this alternative interpretation? For instance, weaker muscles could account for slower movements within a classical time-effort trade-off (as more neural effort would be needed to generate a similar amount of muscle force, thereby suggesting a purposive slowing down of movement). Therefore, could the observed results (slowing down + more submovements) be explained by some neuromuscular deconditioning combined with a difficulty in coordinating multi-joint movements in weightlessness (due to a misestimation or Coriolis/centripetal torques) provide an alternative explanation for the results?

      Response: Neuromuscular deconditioning is indeed a space or microgravity effect; thanks for bringing this up as we omitted the discussion of its possible contribution in the initial submission. However, muscle weakness is less for upper-limb muscles than for postural and lower-limb muscles (Tesch et al., 2005). The handgrip strength decreases 5% to 15% after several months (Moosavi et al., 2021); shoulder and elbow muscles atrophy, though not directly measured, was estimated to be minimal (Shen et al., 2017). The muscle weakness is unlikely to play a major role here since our reaching task involves small movements (~12cm) with joint torques of a magnitude of ~2N·m. Coriolis/centripetal torques does not affect the putative mass effect (as shown above simulations). The reviewer suggests that their poor coordination in microgravity might contribute to slowing down + more submovements. Poor coordination is an umbrella term for any motor control problems, and it can explain any microgravity effect. The feedforward control changes caused by mass underestimation can also be viewed as poor coordination. If we limit it as the coordination of the two joints or coordinating Coriolis/centripetal torques, we should expect to see some trajectory curvature changes in microgravity. However, we further analyzed our reaching trajectories and found no sign of curvature increase in our large collection of reaching movements. We probably have the largest dataset of reaching movements collected in microgravity thus far, given that we had 12 taikonauts and each of them performed about 480 to 840 reaching trials during their spaceflight. We believe the probability of Type II error is quite low here. We will include descriptive statistics of these new analyses in our revision.

      Citation: Tesch, P. A., Berg, H. E., Bring, D., Evans, H. J., & LeBlanc, A. D. (2005). Effects of 17-day spaceflight on knee extensor muscle function and size. European journal of applied physiology, 93(4), 463-468.

      Moosavi, D., Wolovsky, D., Depompeis, A., Uher, D., Lennington, D., Bodden, R., & Garber, C. E. (2021). The effects of spaceflight microgravity on the musculoskeletal system of humans and animals, with an emphasis on exercise as a countermeasure: A systematic scoping review. Physiological Research, 70(2), 119.

      Shen, H., Lim, C., Schwartz, A. G., Andreev-Andrievskiy, A., Deymier, A. C., & Thomopoulos, S. (2017). Effects of spaceflight on the muscles of the murine shoulder. The FASEB Journal, 31(12), 5466.

      (2) Modelling

      a) The model description should be improved as it is currently a mix of discrete time and continuous time formulations. Moreover, an infinite-horizon cost function is used, but I thought the authors used a finite-horizon formulation with the prefixed duration provided by the movement utility maximization framework of Shadmehr et al. (Curr Biol, 2016). Furthermore, was the mass underestimation reflected both in the utility model and the optimal control model? If so, did the authors really compute the feedback control gain with the underestimated mass but simulate the system with the real mass? This is important because the mass appears both in the utility framework and in the LQ framework. Given the current interpretations, the feedforward command is assumed to be erroneous, and the feedback command would allow for motor corrections. Therefore, it could be clarified whether the feedback command also misestimates the mass or not, which may affect its efficiency. For instance, if both feedforward and feedback motor commands are based on wrong internal models (e.g., due to the mass underestimation), one may wonder how the astronauts would execute accurate goal-directed movements.

      b) The model seems to be deterministic in its current form (no motor and sensory noise). Since the framework developed by Todorov (2005) is used, sensorimotor noise could have been readily considered. One could also assume that motor and sensory noise increase in microgravity, and the model could inform on how microgravity affects the number of submovements or endpoint variance due to sensorimotor noise changes, for instance.

      c) Finally, how does the model distinguish the feedforward and feedback components of the motor command that are discussed in the paper, given that the model only yields a feedback control law? Does 'feedforward' refer to the motor plan here (i.e., the prefixed duration and arguably the precomputed feedback gain)?

      We appreciate these very helpful suggestions about our model presentation. Indeed, our initial submission did not give detailed model descriptions in the main text, due to text limits for early submissions. We actually used a finite-horizon framework throughout, with a pre-specified duration derived from the utility model. In the revision, we will make that point clear, and we will also revise the Methods section to explicitly distinguish feedforward vs. feedback components, clarify the use of mass underestimation in both utility and control models, and update the equations accordingly.

      (3) Brevity of movements and speed-accuracy trade-off

      The tested movements are much faster (average duration approx. 350 ms) than similar self-paced movements that have been studied in other works (e.g., Wang et al., J Neurophysiology, 2016; Berret et al., PLOS Comp Biol, 2021, where movements can last about 900-1000 ms). This is consistent with the instructions to reach quickly and accurately, in line with a speed-accuracy trade-off. Was this instruction given to highlight the inertial effects related to the arm's anisotropy? One may however, wonder if the same results would hold for slower self-paced movements (are they also with reduced speed compared to Earth performance?). Moreover, a few other important questions might need to be addressed for completeness: how to ensure that astronauts did remember this instruction during the flight? (could the control group move faster because they better remembered the instruction?). Did the taikonauts perform the experiment on their own during the flight, or did one taikonaut assume the role of the experimenter?

      Thanks for highlighting the brevity of movements in our experiment. Our intention in emphasizing fast movements is to rigorously test whether movement is indeed slowed down in microgravity. The observed prolonged movement duration clearly shows that microgravity affects people’s movement duration, even when they are pushed to move fast. The second reason for using fast movement is to highlight that feedforward control is affected in microgravity. Mass underestimation specifically affects feedforward control in the first place. Slow movement would inevitably have online corrections that might obscure the effect of mass underestimation. Note that movement slowing is not only observed in our speed-emphasized reaching task, but also in whole-arm pointing in other astronauts studies (Berger, 1997; Sangals, 1999), which have been quoted in our paper. We thus believe these findings are generalizable.

      Regarding the consistency of instructions: all our experiments conducted in the Tiangong space station were monitored in real time by experimenters in the Control Center located in Beijing. The task instructions were presented on the initial display of the data acquisition application and ample reading time was allowed. In fact, all the pre-, in-, and post-flight test sessions were administered by the same group of experimenters with the same instruction. It is common that astronauts serve both as participants and experimenters at the same time. And, they were well trained for this type of role on the ground. Note that we had multiple pre-flight test sessions to familiarize them with the task. All these rigorous measures were in place to obtain high-quality data. We will include these experimental details and the rationales for emphasizing fast movements in the revision.

      Citations:

      Berger, M., Mescheriakov, S., Molokanova, E., Lechner-Steinleitner, S., Seguer, N., & Kozlovskaya, I. (1997). Pointing arm movements in short- and long-term spaceflights. Aviation, Space, and Environmental Medicine, 68(9), 781–787.

      Sangals, J., Heuer, H., Manzey, D., & Lorenz, B. (1999). Changed visuomotor transformations during and after prolonged microgravity. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 129(3), 378–390.

      (4) No learning effect

      This is a surprising effect, as mentioned by the authors. Other studies conducted in microgravity have indeed revealed an optimal adaptation of motor patterns in a few dozen trials (e.g., Gaveau et al., eLife, 2016). Perhaps the difference is again related to single-joint versus multi-joint movements. This should be better discussed given the impact of this claim. Typically, why would a "sensory bias of bodily property" persist in microgravity and be a "fundamental constraint of the sensorimotor system"?

      We believe the differences between our study and Gaveau et al.’s study cannot be simply attributed to single-joint versus multi-joint movements. One of the most salient differences is that their adaptation is about incorporating microgravity in control for minimizing effort, while our adaptation is about rightfully perceiving body mass. We will elaborate on possible reasons for the lack of learning in the light of this previous study.

      We can elaborate on “sensory bias” and “fundamental constraint of the sensorimotor system”. If an inertial change is perceived (like an extra weight attached to the forearm, as in previous motor adaptation studies), people can adapt their reaching in tens of trials. In this case, sensory cues are veridical as they correctly inform about the inertial perturbation. However, in microgravity, reduced gravitational pull and proprioceptive inputs constantly inform the controller that the body mass is less than its actual magnitude. In other words, sensory cues in space are misleading for estimating body mass. The resulting sensory bias prevents the sensorimotor system from correctly adapt. Our statement was too brief in the initial submission; we will expand it in the revision.

      Reviewer #3 (Public review):

      Summary:

      The authors describe an interesting study of arm movements carried out in weightlessness after a prolonged exposure to the so-called microgravity conditions of orbital spaceflight. Subjects performed radial point-to-point motions of the fingertip on a touch pad. The authors note a reduction in movement speed in weightlessness, which they hypothesize could be due to either an overall strategy of lowering movement speed to better accommodate the instability of the body in weightlessness or an underestimation of body mass. They conclude for the latter, mainly based on two effects. One, slowing in weightlessness is greater for movement directions with higher effective mass at the end effector of the arm. Two, they present evidence for an increased number of corrective submovements in weightlessness. They contend that this provides conclusive evidence to accept the hypothesis of an underestimation of body mass.

      Strengths:

      In my opinion, the study provides a valuable contribution, the theoretical aspects are well presented through simulations, the statistical analyses are meticulous, the applicable literature is comprehensively considered and cited, and the manuscript is well written.

      Weaknesses:

      Nevertheless, I am of the opinion that the interpretation of the observations leaves room for other possible explanations of the observed phenomenon, thus weakening the strength of the arguments.

      First, I would like to point out an apparent (at least to me) divergence between the predictions and the observed data. Figures 1 and S1 show that the difference between predicted values for the 3 movement directions is almost linear, with predictions for 90º midway between predictions for 45º and 135º. The effective mass at 90º appears to be much closer to that of 45º than to that of 135º (Figure S1A). But the data shown in Figure 2 and Figure 3 indicate that movements at 90º and 135º are grouped together in terms of reaction time, movement duration, and peak acceleration, while both differ significantly from those values for movements at 45º.

      Furthermore, in Figure 4, the change in peak acceleration time and relative time to peak acceleration between 1g and 0g appears to be greater for 90º than for 135º, which appears to me to be at least superficially in contradiction with the predictions from Figure S1. If the effective mass is the key parameter, wouldn't one expect as much difference between 90º and 135º as between 90º and 45º? It is true that peak speed (Figure 3B) and peak speed time (Figure 4B) appear to follow the ordering according to effective mass, but is there a mathematical explanation as to why the ordering is respected for velocity but not acceleration? These inconsistencies weaken the author's conclusions and should be addressed.

      Indeed, the model predicts an almost equal separation between 45° and 90° and between 90° and 135°, while the data indicate that the spacing between 45° and 90° is much smaller than between 90° and 135°. We do not regard the divergence as evidence undermining our main conclusion since 1) the model is a simplification of the actual situation. For example, the model simulates an ideal case of moving a point mass (effective mass) without friction and without considering Coriolis and centripetal torques. 2) Our study does not make quantitative predictions of all the key kinematic measures; that will require model fitting and parameter estimation; instead, our study uses well-established (though simplified) models to qualitatively predict the overall behavioral pattern we would observe. For this purpose, our results are well in line with our expectations: though we did not find equal spacing between direction conditions, we do confirm that the key kinematic properties (Figure 2 and Figure 3 as questioned) follow the same ranking order of directions as predicted.

      We thank the reviewer for pointing out the apparent discrepancy between model simulation and observed data. We will elaborate on the reasons behind the discrepancy in the revision.

      Then, to strengthen the conclusions, I feel that the following points would need to be addressed:

      (1) The authors model the movement control through equations that derive the input control variable in terms of the force acting on the hand and treat the arm as a second-order low-pass filter (Equation 13). Underestimation of the mass in the computation of a feedforward command would lead to a lower-than-expected displacement to that command. But it is not clear if and how the authors account for a potential modification of the time constants of the 2nd order system. The CNS does not effectuate movements with pure torque generators. Muscles have elastic properties that depend on their tonic excitation level, reflex feedback, and other parameters. Indeed, Fisk et al.* showed variations of movement characteristics consistent with lower muscle tone, lower bandwidth, and lower damping ratio in 0g compared to 1g. Could the variations in the response to the initial feedforward command be explained by a misrepresentation of the limbs' damping and natural frequency, leading to greater uncertainty about the consequences of the initial command? This would still be an argument for unadapted feedforward control of the movement, leading to the need for more corrective movements. But it would not necessarily reflect an underestimation of body mass.

      *Fisk, J. O. H. N., Lackner, J. R., & DiZio, P. A. U. L. (1993). Gravitoinertial force level influences arm movement control. Journal of neurophysiology, 69(2), 504-511.

      We agree that muscle properties, tonic excitation level, proprioception-mediated reflexes all contribute to reaching control. Fisk et al. (1993) study indeed showed that arm movement kinematics change, possibly owing to lower muscle tone and/or damping. However, reduced muscle damping and reduced spindle activity are more likely to affect feedback-based movements. Like in Fisk et al.’s study, people performed continuous arm movements with eyes closed; thus their movements largely relied on proprioceptive control. Our major findings are about the feedforward control, i.e., the reduced and “advanced” peak velocity/acceleration in discrete and ballistic reaching movements. Note that the peak acceleration happens as early as approximately 90-100ms into the movements, clearly showing that feedforward control is affected -- a different effect from Fisk et al’s findings. It is unlikely that people “advanced” their peak velocity/acceleration because they feel the need for more later corrective movements. Thus, underestimation of body mass remains the most plausible explanation.

      (2) The movements were measured by having the subjects slide their finger on the surface of a touch screen. In weightlessness, the implications of this contact are expected to be quite different than those on the ground. In weightlessness, the taikonauts would need to actively press downward to maintain contact with the screen, while on Earth, gravity will do the work. The tangential forces that resist movement due to friction might therefore be different in 0g. This could be particularly relevant given that the effect of friction would interact with the limb in a direction-dependent fashion, given the anisotropy of the equivalent mass at the fingertip evoked by the authors. Is there some way to discount or control for these potential effects?

      We agree that friction might play a role here, but normal interaction with a touch screen typically involves friction between 0.1 and 0.5N (e.g., Ayyildiz et al., 2018). We believe that the directional variation is even smaller than 0.1N. It is very small compared to the force used to accelerate the arm for the reaching movement (10-15N). Thus, friction anisotropy is unlikely to explain our data.

      Citation: Ayyildiz M, Scaraggi M, Sirin O, Basdogan C, Persson BNJ. Contact mechanics between the human finger and a touchscreen under electroadhesion. Proc Natl Acad Sci U S A. 2018 Dec 11;115(50):12668-12673.

      (3) The carefully crafted modelling of the limb neglects, nevertheless, the potential instability of the base of the arm. While the taikonauts were able to use their left arm to stabilize their bodies, it is not clear to what extent active stabilization with the contralateral limb can reproduce the stability of the human body seated in a chair in Earth gravity. Unintended motion of the shoulder could account for a smaller-than-expected displacement of the hand in response to the initial feedforward command and/or greater propensity for errors (with a greater need for corrective submovements) in 0g. The direction of movement with respect to the anchoring point could lead to the dependence of the observed effects on movement direction. Could this be tested in some way, e.g., by testing subjects on the ground while standing on an unstable base of support or sitting on a swing, with the same requirement to stabilize the torso using the contralateral arm?

      Body stabilization is always a challenge for human movement studies in space. We minimized its potential confounding effects by using left-hand grasping and foot straps for postural support throughout the experiment. We would argue shoulder stability is an unlikely explanation because unexpected shoulder instability should not affect the feedforward (early) part of the ballistic reaching movement: the reduced peak acceleration and its early peak were observed at about 90-100ms after movement initiation. This effect is too early to be explained by an expected stability issue.

      The arguments for an underestimation of body mass would be strengthened if the authors could address these points in some way.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study from Zhu and colleagues, a clear role for MED26 in mouse and human erythropoiesis is demonstrated that is also mapped to amino acids 88-480 of the human protein. The authors also show the unique expression of MED26 in later-stage erythropoiesis and propose transcriptional pausing and condensate formation mechanisms for MED26's role in promoting erythropoiesis. Despite the author's introductory claim that many questions regarding Pol II pausing in mammalian development remain unanswered, the importance of transcriptional pausing in erythropoiesis has actually already been demonstrated (Martell-Smart, et al. 2023, PMID: 37586368, which the authors notably did not cite in this manuscript). Here, the novelty and strength of this study is MED26 and its unique expression kinetics during erythroid development.

      Strengths:

      The widespread characterization of kinetics of mediator complex component expression throughout the erythropoietic timeline is excellent and shows the interesting divergence of MED26 expression pattern from many other mediator complex components. The genetic evidence in conditional knockout mice for erythropoiesis requiring MED26 is outstanding. These are completely new models from the investigators and are an impressive amount of work to have both EpoR-driven deletion and inducible deletion. The effect on red cell number is strong in both. The genetic over-expression experiments are also quite impressive, especially the investigators' structure-function mapping in primary cells. Overall the data is quite convincing regarding the genetic requirement for MED26. The authors should be commended for demonstrating this in multiple rigorous ways.

      Thank you for your positive feedback.

      Weaknesses:

      (1) The authors state that MED26 was nominated for study based on RNA-seq analysis of a prior published dataset. They do not however display any of that RNA-seq analysis with regards to Mediator complex subunits. While they do a good job showing protein-level analysis during erythropoiesis for several subunits, the RNA-seq analysis would allow them to show the developmental expression dynamics of all subunit members.

      Thank you for this helpful suggestion. While we did not originally nominate MED26 based on RNA-seq analysis, we have analyzed the transcript levels of Mediator complex subunits in our RNA-seq data across different stages of erythroid differentiation (Author response image 1). The results indicate that most Mediator subunits, including MED26, display decreased RNA expression over the course of differentiation, with the exception of MED25, as reported previously (Pope et al., Mol Cell Biol 2013. PMID: 23459945).

      Notably, our study is based on initial observations at the protein level, where we found that, unlike most other Mediator subunits that are downregulated during erythropoiesis, MED26 remains relatively abundant. Protein expression levels more directly reflect the combined influences of transcription, translation and degradation processes within cells, and are likely more closely related to biological functions in this context. It is possible that post-transcriptional regulation (such as m6A-mediated improvement of translational efficiency) or post-translational modifications (like escape from ubiquitination) could contribute to the sustained levels of MED26 protein, and this will be an interesting direction for future investigation.

      Author response image 1.

      Relative RNA expression of Mediator complex subunits during erythropoiesis in human CD34+ erythroid cultures. Different differentiation stages from HSPCs to late erythroblasts were identified using CD71 and CD235a markers, progressing sequentially as CD71-CD235a-, CD71+CD235a-, CD71+CD235a+, and CD71-CD235a+. Expression levels were presented as TPM (transcripts per million).

      (2) The authors use an EpoR Cre for red cell-specific MED26 deletion. However, other studies have now shown that the EpoR Cre can also lead to recombination in the macrophage lineage, which clouds some of the in vivo conclusions for erythroid specificity. That being said, the in vitro erythropoiesis experiments here are convincing that there is a major erythroid-intrinsic effect.

      Thank you for this insightful comment. We recognize that EpoR-Cre can drive recombination in both erythroid and macrophage lineages (Zhang et al., Blood 2021, PMID: 34098576). However, EpoR-Cre remains the most widely used Cre for studying erythroid lineage effects in the hematopoietic community. Numerous studies have employed EpoR-Cre for erythroid-specific gene knockout models (Pang et al, Mol Cell Biol 2021, PMID: 22566683; Santana-Codina et al., Haematologica 2019, PMID: 30630985; Xu et al., Science 2013, PMID: 21998251.).

      While a GYPA (CD235a)-Cre model with erythroid specificity has recently been developed (https://www.sciencedirect.com/science/article/pii/S0006497121029074), it has not yet been officially published. We look forward to utilizing the GYPA-Cre model for future studies. As you noted, our in vivo mouse model and primary human CD34+ erythroid differentiation system both demonstrate that MED26 is essential for erythropoiesis, suggesting that the regulatory effects of MED26 in our study are predominantly erythroid-intrinsic.

      (3) Te donor chimerism assessment of mice transplanted with MED26 knockout cells is a bit troubling. First, there are no staining controls shown and the full gating strategy is not shown. Furthermore, the authors use the CD45.1/CD45.2 system to differentiate between donor and recipient cells in erythroblasts. However, CD45 is not expressed from the CD235a+ stage of erythropoiesis onwards, so it is unclear how the authors are detecting essentially zero CD45-negative cells in the erythroblast compartment. This is quite odd and raises questions about the results. That being said, the red cell indices in the mice are the much more convincing data.

      Thank you for your careful and thorough feedback. We have now included negative staining controls (Author response image 2A, top). We agree that CD45 is typically not expressed in erythroid precursors in normal development. Prior studies have characterized BFU-E and CFU-E stages as c-Kit+CD45+Ter119−CD71low and c-Kit+CD45−Ter119−CD71high cells in fetal liver (Katiyar et al, Cells 2023, PMID: 37174702).

      However, our observations indicate that erythroid surface markers differ during hematopoiesis reconstitution following bone marrow transplantation.  We found that nearly all nucleated erythroid progenitors/precursors (Ter119+Hoechst+) express CD45 after hematopoiesis reconstitution (Author response image 2A, bottom).

      To validate our assay, we performed next-generation sequencing by first mixing mouse CD45.1 and CD45.2 total bone marrow cells at a 1:2 ratio. We then isolated nucleated erythroid progenitors/precursors (Ter119+Hoechst+) by FACS and sequenced the CD45 gene locus by targeted sequencing. The resulting CD45 allele distribution matched our initial mixing ratio, confirming the accuracy of our approach (Author response image 2B).

      Moreover, a recent study supports that reconstituted erythroid progenitors can indeed be distinguished by CD45 expression following bone marrow transplantation (He et al., Nature Aging 2024, PMID: 38632351. Extended Data Fig. 8). 

      In conclusion, our data indicate that newly formed erythroid progenitors/precursors post-transplant express CD45, enabling us to identify nucleated erythroid progenitors/precursors by Ter119+Hoechst+ and determine their origin using CD45.1 and CD45.2 markers.

      Author response image 2.

      Representative flow cytometry gating strategy of erythroid chimerism following mouse bone marrow transplantation. A. Gating strategy used in the erythroid chimerism assay. B. Targeted sequencing result of Ter119+Hoechst+ cells isolated by FACS. The cell sample was pre-mixed with 1/3 CD45.2 and 2/3 CD45.1 bone marrow cells. Ptprc is the gene locus for CD45.

      (4) The authors make heavy use of defining "erythroid gene" sets and "non-erythroid gene" sets, but it is unclear what those lists of genes actually are. This makes it hard to assess any claims made about erythroid and non-erythroid genes.

      Thank you for this helpful suggestion. We defined "erythroid genes" and "non-erythroid genes" based on RNA-seq data from Ludwig et al. (Cell Reports 2019. PMID: 31189107. Figure 2 and Table S1). Genes downregulated from stages k1 to k5 are classified as “non-erythroid genes,” while genes upregulated from stages k6 to k7 are classified as “erythroid genes.” We will add this description in the revised manuscript.

      (5) Overall the data regarding condensate formation is difficult to interpret and is the weakest part of this paper. It is also unclear how studies of in vitro condensate formation or studies in 293T or K562 cells can truly relate to highly specialized erythroid biology. This does not detract from the major findings regarding genetic requirements of MED26 in erythropoiesis.

      Thank you for the rigorous feedback. Assessing the condensate properties of MED26 protein in primary CD34+ erythroid cells or mouse models is indeed challenging. As is common in many condensate studies, we used in vitro assays and cellular assays in HEK293T and K562 cells to examine the biophysical properties (Figure S7), condensation formation capacity (Figure 5C and Figure S7C), key phase-separation regions of MED26 protein (Figure S6), and recruitment of pausing factors (Figure 6A-B) in live cells. We then conducted functional assays to demonstrate that the phase-separation region of MED26 can promote erythroid differentiation similarly to the full-length protein in the CD34+ system and K562 cells (Figure 5A). Specifically, overexpressing the MED26 phase-separation domain accelerates erythropoiesis in primary human erythroid culture, while deleting the Intrinsically Disordered Region (IDR) impairs MED26’s ability to form condensates and recruit PAF1 in K562 cells.

      In summary, we used HEK293T cells to study the biochemical and biophysical properties of MED26, and the primary CD34+ differentiation system to examine its developmental roles. Our findings support the conclusion that MED26-associated condensate formation promotes erythropoiesis.

      (6) For many figures, there are some panels where conclusions are drawn, but no statistical quantification of whether a difference is significant or not.

      Thank you for your thorough feedback. We have checked all figures for statistical quantification and added the relevant statistical analysis methods to the corresponding figure legends (Figure 2L and Figure S4C) to clarify the significance of the observed differences. The updated information will be incorporated into the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Zhu et al describes a novel role for MED26, a subunit of the Mediator complex, in erythroid development. The authors have discovered that MED26 promotes transcriptional pausing of RNA Pol II, by recruiting pausing-related factors.

      Strengths:

      This is a well-executed study. The authors have employed a range of cutting-edge and appropriate techniques to generate their data, including: CUT&Tag to profile chromatin changes and mediator complex distribution; nuclear run-on sequencing (PRO-seq) to study Pol II dynamics; knockout mice to determine the phenotype of MED26 perturbation in vivo; an ex vivo erythroid differentiation system to perform additional, important, biochemical and perturbation experiments; immunoprecipitation mass spectrometry (IP-MS); and the "optoDroplet" assay to study phase-separation and molecular condensates.

      This is a real highlight of the study. The authors have managed to generate a comprehensive picture by employing these multiple techniques. In doing so, they have also managed to provide greater molecular insight into the workings of the MEDIATOR complex, an important multi-protein complex that plays an important role in a range of biological contexts. The insights the authors have uncovered for different subunits in erythropoiesis will very likely have ramifications in many other settings, in both healthy biology and disease contexts.

      Thank you for your thoughtful summary and encouraging feedback.

      Weaknesses:

      There are almost no discernible weaknesses in the techniques used, nor the interpretation of the data. The IP-MS data was generated in HEK293 cells when it could have been performed in the human CD34+ HSPC system that they employed to generate a number of the other data. This would have been a more natural setting and would have enabled a more like-for-like comparison with the other data.

      Thank you for your positive feedback and insightful suggestions. We will perform validation of the immunoprecipitation results in CD34+ derived erythroid cells to further confirm our findings.

      Reviewer #3 (Public review):

      Summary:

      The authors aim to explore whether other subunits besides MED1 exert specific functions during the process of terminal erythropoiesis with global gene repression, and finally they demonstrated that MED26-enriched condensates drive erythropoiesis through modulating transcription pausing.

      Strengths:

      Through both in vitro and in vivo models, the authors showed that while MED1 and MED26 co-occupy a plethora of genes important for cell survival and proliferation at the HSPC stage, MED26 preferentially marks erythroid genes and recruits pausing-related factors for cell fate specification. Gradually, MED26 becomes the dominant factor in shaping the composition of transcription condensates and transforms the chromatin towards a repressive yet permissive state, achieving global transcription repression in erythropoiesis.

      Thank you for your positive summary and feedback.

      Weaknesses:

      In the in vitro model, the author only used CD34+ cell-derived erythropoiesis as the validation, which is relatively simple, and more in vitro erythropoiesis models need to be used to strengthen the conclusion.

      Thank you for your thoughtful suggestions. We have shown that MED26 promotes erythropoiesis using the primary human CD34+ differentiation system (Figure 2 K-M and Figure S4) and have demonstrated its essential role in erythropoiesis through multiple mouse models (Figure 2A-G and Figure S1-3). Together, these in vitro and in vivo results support our conclusion that MED26 regulates erythropoiesis. However, we are open to further validating our findings with additional in vitro erythropoiesis models, such as iPSC or HUDEP erythroid differentiation systems.

    1. Author Response

      Reviewer #1 (Public Review):

      [...] Genes expressed in the same direction in lowland individuals facing hypoxia (the plastic state) as what is found in the colonised state are defined as adaptative, while genes with the opposite expression pattern were labelled as maladaptive, using the assumption that the colonised state must represent the result of natural selection. Furthermore, genes could be classified as representing reversion plasticity when the expression pattern differed between the plasticity and colonised states and as reinforcement when they were in the same direction (for example more expressed in the plastic state and the colonised state than in the ancestral state). They found that more genes had a plastic expression pattern that was labelled as maladaptive than adaptive. Therefore, some of the genes have an expression pattern in accordance with what would be predicted based on the plasticity-first hypothesis, while others do not.

      Thank you for a precise summary of our work. We appreciate the very encouraging comments recognizing the value of our work. We have addressed concerns from the reviewer in greater detail below.

      Q1. As pointed out by the authors themselves, the fact that temperature was not included as a variable, which would make the experimental design much more complex, misses the opportunity to more accurately reflect the environmental conditions that the colonizer individuals face at high altitude. Also pointed out by the authors, the acclimation experiment in hypoxia lasted 4 weeks. It is possible that longer term effects would be identifiable in gene expression in the lowland individuals facing hypoxia on a longer time scale. Furthermore, a sample size of 3 or 4 individuals per group depending on the tissue for wild individuals may miss some of the natural variation present in these populations. Stating that they have a n=7 for the plastic stage and n= 14 for the ancestral and colonized stages refers to the total number of tissue samples and not the number of individuals, according to supplementary table 1.

      We shared the same concerns as the reviewer. This is partly because it is quite challenging to bring wild birds into captivity to conduct the hypoxia acclimation experiments. We had to work hard to perform acclimation experiments by taking lowland sparrows in a hypoxic condition for a month. We indeed have recognized the similar set of limitations as the review pointed out and have discussed the limitations in the study, i.e., considering hypoxic condition alone, short time acclimation period, etc. Regarding sample sizes, we have collected cardiac muscle from nine individuals (three individuals for each stage) and flight muscle from 12 individuals (four individuals for each stage). We have clarified this in Supplementary Table 1.

      Q2. Finally, I could not find a statement indicating that the lowland individuals placed in hypoxia (plastic stage) were from the same population as the lowland individuals for which transcriptomic data was already available, used as the "ancestral state" group (which themselves seem to come from 3 populations Qinghuangdao, Beijing, and Tianjin, according to supplementary table 2) nor if they were sampled in the same time of year (pre reproduction, during breeding, after, or if they were juveniles, proportion of males or females, etc). These two aspects could affect both gene expression (through neutral or adaptive genetic variation among lowland populations that can affect gene expression, or environmental effects other than hypoxia that differ in these populations' environments or because of their sexes or age). This could potentially also affect the FST analysis done by the authors, which they use to claim that strong selective pressure acted on the expression level of some of the genes in the colonised group.

      The reviewer asked how individual tree sparrows used in the transcriptomic analyses were collected. The individuals used for the hypoxia acclimation experiment and represented the ancestral lowland population were collected from the same locality (Beijing) and at the same season (i.e., pre-breeding) of the year. They are all adults and weight approximately 18g. We have clarified this in the Supplementary Table S1 and Methods. We did not distinguish males from females (both sexes look similar) under the assumption that both sexes respond similarly to hypoxia acclimation in their cardiac and flight muscle gene expression.

      The Supplementary Table 2 lists the individuals that were used for sequence analyses. These individuals were only used for sequence comparisons but not for the transcriptomic analyses. The population genetic structure analyzed in a previously published study showed that there is no clear genetic divergence within the lowland population (i.e., individuals collected from Beijing, Tianjing and Qinhuangdao) or the highland population (i.e., Gangcha and Qinghai Lake). In addition, there was no clear genetic divergence between the highland and lowland populations (Qu et al. 2020).

      Author response image 1.

      Population genetic structure of the Eurasian Tree Sparrow (Passer montanus). The genetic structure generated using FRAPPE. The colors in each column represent the contribution from each subcluster (Qu et al. 2020). Yellow, highland population; blue, lowland population.

      Q4. Impact of the work There has been work showing that populations adapted to high altitude environments show changes in their hypoxia response that differs from the short-term acclimation response of lowland population of the same species. For example, in humans, see Erzurum et al. 2007 and Peng et al. 2017, where they show that the hypoxia response cascade, which starts with the gene HIF (Hypoxia-Inducible Factor) and includes the EPO gene, which codes for erythropoietin, which in turns activates the production of red blood cell, is LESS activated in high altitude individuals compared to the activation level in lowland individuals (which gives it its name). The present work adds to this body of knowledge showing that the short-term response to hypoxia and the long term one can affect different pathways and that acclimation/plasticity does not always predict what physiological traits will evolve in populations that colonize these environments over many generations and additional selection pressure (UV exposure, temperature, nutrient availability). Altogether, this work provides new information on the evolution of reaction norms of genes associated with the physiological response to one of the main environmental variables that affects almost all animals, oxygen availability. It also provides an interesting model system to study this type of question further in a natural population of homeotherms.

      Erzurum, S. C., S. Ghosh, A. J. Janocha, W. Xu, S. Bauer, N. S. Bryan, J. Tejero et al. "Higher blood flow and circulating NO products offset high-altitude hypoxia among Tibetans." Proceedings of the National Academy of Sciences 104, no. 45 (2007): 17593-17598. Peng, Y., C. Cui, Y. He, Ouzhuluobu, H. Zhang, D. Yang, Q. Zhang, Bianbazhuoma, L. Yang, Y. He, et al. 2017. Down-regulation of EPAS1 transcription and genetic adaptation of Tibetans to high-altitude hypoxia. Molecular biology and evolution 34:818-830.

      Thank you for highlighting the potential novelty of our work in light of the big field. We found it very interesting to discuss our results (from a bird species) together with similar findings from humans. In the revised version of manuscript, we have discussed short-term acclimation response and long-term adaptive evolution to a high-elevation environment, as well as how our work provides understanding of the relative roles of short-term plasticity and long-term adaptation. We appreciate the two important work pointed out by the reviewer and we have also cited them in the revised version of manuscript.

      Reviewer #2 (Public Review):

      This is a well-written paper using gene expression in tree sparrow as model traits to distinguish between genetic effects that either reinforce or reverse initial plastic response to environmental changes. Tree sparrow tissues (cardiac and flight muscle) collected in lowland populations subject to hypoxia treatment were profiled for gene expression and compared with previously collected data in 1) highland birds; 2) lowland birds under normal condition to test for differences in directions of changes between initial plastic response and subsequent colonized response. The question is an important and interesting one but I have several major concerns on experimental design and interpretations.

      Thank you for a precise summary of our work and constructive comments to improve this study. We have addressed your concerns in greater detail below.

      Q1. The datasets consist of two sources of data. The hypoxia treated birds collected from the current study and highland and lowland birds in their respective native environment from a previous study. This creates a complete confounding between the hypoxia treatment and experimental batches that it is impossible to draw any conclusions. The sample size is relatively small. Basically correlation among tens of thousands of genes was computed based on merely 12 or 9 samples.

      We appreciate the critical comments from the reviewer. The reviewer raised the concerns about the batch effect from birds collected from the previous study and this study. There is an important detail we didn’t describe in the previous version. All tissues from hypoxia acclimated birds and highland and lowland birds have been collected at the same time (i.e., Qu et al. 2020). RNA library construction and sequencing of these samples were also conducted at the same time, although only the transcriptomic data of lowland and highland tree sparrows were included in Qu et al. (2020). The data from acclimated birds have not been published before.

      In the revised version of manuscript, we also compared log-transformed transcript per million (TPM) across all genes and determined the most conserved genes (i.e., coefficient of variance ≤  0.3 and average TPM ≥ 1 for each sample) for the flight and cardiac muscles, respectively (Hao et al. 2023). We compared the median expression levels of these conserved genes and found no difference among the lowland, hypoxia-exposed lowland, and highland tree sparrows (Wilcoxon signed-rank test, P<0.05). As these results suggested little batch effect on the transcriptomic data, we used TPM values to calculate gene expression level and intensity. This methodological detail has been further clarified in the Methods and we also provided a new supplementary Figure (Figure S5) to show the comparative results.

      Author response image 2.

      The median expression levels of the conserved genes (i.e., coefficient of variance ≤ 0.3 and average TPM ≥ 1 for each sample) did not differ among the lowland, hypoxia-exposed lowland, and highland tree sparrows (Wilcoxon signed-rank test, P<0.05).

      The reviewer also raised the issue of sample size. We certainly would have liked to have more individuals in the study, but this was not possible due to the logistical problem of keeping wild bird in a common garden experiment for a long time. We have acknowledged this in the manuscript. In order to mitigate this we have tested the hypothesis of plasticity following by genetic change using two different tissues (cardiac and flight muscles) and two different datasets (co-expressed gene-set and muscle-associated gene-set). As all these analyses show similar results, they indicate that the main conclusion drawn from this study is robust.

      Q2. Genes are classified into two classes (reversion and reinforcement) based on arbitrarily chosen thresholds. More "reversion" genes are found and this was taken as evidence reversal is more prominent. However, a trivial explanation is that genes must be expressed within a certain range and those plastic changes simply have more space to reverse direction rather than having any biological reason to do so.

      Thank you for the critical comments. There are two questions raised we should like to address them separately. The first concern centered on the issue of arbitrarily chosen thresholds. In our manuscript, we used a range of thresholds, i.e., 50%, 100%, 150% and 200% of change in the gene expression levels of the ancestral lowland tree sparrow to detect genes with reinforcement and reversion plasticity. By this design we wanted to explore the magnitudes of gene expression plasticity (i.e., Ho & Zhang 2018), and whether strength of selection (i.e., genetic variation) changes with the magnitude of gene expression plasticity (i.e., Campbell-Staton et al. 2021).

      As the reviewer pointed out, we have now realized that this threshold selection is arbitrarily. We have thus implemented two other categorization schemes to test the robustness of the observation of unequal proportions of genes with reinforcement and reversion plasticity. Specifically, we used a parametric bootstrap procedure as described in Ho & Zhang (2019), which aimed to identify genes resulting from genuine differences rather than random sampling errors. Bootstrap results suggested that genes exhibiting reversing plasticity significantly outnumber those exhibiting reinforcing plasticity, suggesting that our inference of an excess of genes with reversion plasticity is robust to random sampling errors. We have added these analyses to the revised version of manuscript, and provided results in the Figure 2d and Figure 3d.

      Author response image 3.

      Figure 2a (left) and Figure 2b (right). Frequencies of genes with reinforcement and reversion plasticity (>50%) and their subsets that acquire strong support in the parametric bootstrap analyses (≥ 950/1000).

      In addition, we adapted a bin scheme (i.e., 20%, 40% and 60% bin settings along the spectrum of the reinforcement/reversion plasticity). These analyses based on different categorization schemes revealed similar results, and suggested that our inference of an excess of genes with reversion plasticity is robust. We have provided these results in the Supplementary Figure S2 and S4.

      Author response image 4.

      (A) and Figure S4 (B). Frequencies of genes with reinforcement and reversion plasticity in the flight and cardiac muscle. (A) For genes identified by WGCNA, all comparisons show that there are more genes showing reversion plasticity than those showing reinforcement plasticity for both the flight and cardiac msucles. (B) For genes that associated with muscle phentoypes, all comparisons show that there are more genes showing reversion plasticity than those showing reinforcement plasticity for the flight muscle, while more than 50% of comparisons support an excess of genes with reversion plasticity for the cardiac muscle. Two-tailed binomial test, NS, non-significant; , P < 0.05; , P < 0.01; **, P < 0.001.

      The second issue that the reviewer raised is that the plastic changes simply have more space to reverse direction rather than having any biological reason to do so. While a causal reason why there are more genes with expression levels being reversed than those with expression levels being reinforced at the late stages is still contentious, increasingly many studies show that genes expression plasticity at the early stage may be functionally maladapted to novel environment that the species have recently colonized (i.e., lizard, Campbell-Staton et al. 2021; Escherichia coli, yeast, guppies, chickens and babblers, Ho and Zhang 2018; Ho et al. 2020; Kuo et al. 2023). Our comparisons based on the two genesets that are associated with muscle phenotypes corroborated with these previous studies and showed that initial gene expression plasticity may be nonadaptive to the novel environments (i.e., Ghalambor et al. 2015; Ho & Zhang 2018; Ho et al. 2020; Kuo et al. 2023; Campbell-Staton et al. 2021).

      Q3. The correlation between plastic change and evolved divergence is an artifact due to the definitions of adaptive versus maladaptive changes. For example, the definition of adaptive changes requires that plastic change and evolved divergence are in the same direction (Figure 3a), so the positive correlation was a result of this selection (Figure 3d).

      The reviewer raised an issue that the correlation between plastic change and evolved divergence is an artifact because of the definition of adaptive versus maladaptive changes, for example, Figure 3d. We agree with the reviewer that the correlation analysis is circular because the definition of adaptive and maladaptive plasticity depends on the direction of plastic change matched or opposed that of the colonized tree sparrows. We have thus removed previous Figure 3d-e and related texts from the revised version of manuscript. Meanwhile, we have changed Figure 3a to further clarify the schematic framework.

    1. eLife Assessment

      This study presents a fundamental discovery of how cerebellar climbing fibers modulate plastic changes in the somatosensory cortex by identifying both the responsible cortical circuit and the anatomical pathways. The evidence supporting the conclusions is convincing and well supported by modern neuroscience methodologies. Overall, this work represents a significant contribution that will be of broad interest to neuroscientists, especially those studying the long-distance cerebellar influence on non-motor brain functions.

    2. Reviewer #1 (Public review):

      Summary:

      Silbaugh, Koster, and Hansel investigated how the cerebellar climbing fiber (CF) signals influence neuronal activity and plasticity in mouse primary somatosensory (S1) cortex. They found that optogenetic activation of CFs in the cerebellum modulates responses of cortical neurons to whisker stimulation in a cell-type-specific manner and suppresses potentiation of layer 2/3 pyramidal neurons induced by repeated whisker stimulation. This suppression of plasticity by CF activation is mediated through modulation of VIP- and SST-positive interneurons. Using transsynaptic tracing and chemogenetic approaches, the authors identified a pathway from the cerebellum through the zona incerta and the thalamic posterior medial (POm) nucleus to the S1 cortex, which underlies this functional modulation.

      Strengths:

      This study employed a combination of modern neuroscientific techniques, including two-photon imaging, opto- and chemo-genetic approaches, and transsynaptic tracing. The experiments were thoroughly conducted, and the results were clearly and systematically described. The interplay between the cerebellum and other brain regions - and its functional implications - is one of the major topics in this field. This study provides solid evidence for an instructive role of the cerebellum in experience-dependent plasticity in the S1 cortex.

      Weaknesses:

      There may be some methodological limitations, and the physiological relevance of the CF-induced plasticity modulation in the S1 cortex remains unclear. In particular, it has not been elucidated how CF activity influences the firing patterns of downstream neurons along the pathway to the S1 cortex during stimulation.

      (1) Optogenetic stimulation may have activated a large population of CFs synchronously, potentially leading to strong suppression followed by massive activation in numerous cerebellar nuclear (CN) neurons. Given that there is no quantitative estimation of the stimulated area or number of activated CFs, observed effects are difficult to interpret directly. The authors should at least provide the basic stimulation parameters (coordinates of stim location, power density, spot size, estimated number of Purkinje cells included, etc.).

      (2) There are CF collaterals directly innervating CN (PMID:10982464). Therefore, antidromic spikes induced by optogenetic stimulation may directly activate CN neurons. On the other hand, a previous study reported that CN neurons exhibit only weak responses to CF collateral inputs (PMID: 27047344). The authors should discuss these possibilities and the potential influence of CF collaterals on the interpretation of the results.

      (3) The rationale behind the plasticity induction protocol for RWS+CF (50 ms light pulses at 1 Hz during 5 min of RWS, with a 45 ms delay relative to the onset of whisker stimulation) is unclear.

      a) The authors state that 1 Hz was chosen to match the spontaneous CF firing rate (line 107); however, they also introduced a delay to mimic the CF response to whisker stimulation (line 108). This is confusing, and requires further clarification, specifically, whether the protocol was designed to reproduce spontaneous or sensory-evoked CF activity.

      b) Was the timing of delivering light pulses constant or random? Given the stochastic nature of CF firing, randomly timed light pulses with an average rate of 1Hz would be more physiologically relevant. At the very least, the authors should provide a clear explanation of how the stimulation timing was implemented.

      (4) CF activation modulates inhibitory interneurons in the S1 cortex (Figure 2): responses of interneurons in S1 to whisker stimulation were enhanced upon CF coactivation (Figure 2C), and these neurons were predominantly SST- and PV-positive interneurons (Figure 2H, I). In contrast, VIP-positive neurons were suppressed only in the late time window of 650-850 ms (Figure 2G). If the authors' hypothesis-that the activity of VIP neurons regulates SST- and PV-neuron activity during RWS+CF-is correct, then the activity of SST- and PV-neurons should also be increased during this late time window. The authors should clarify whether such temporal dynamics were observed or could be inferred from their data.

      (5) Transsynaptic tracing from CN nicely identified zona incerta (ZI) neurons and their axon terminals in both POm and S1 (Figure 6 and Figure S7).

      a) Which part of the CN (medial, interposed, or lateral) is involved in this pathway is unclear.

      b) Were the electrophysiological properties of these ZI neurons consistent with those of PV neurons?

      c) There appears to be a considerable number of axons of these ZI neurons projecting to the S1 cortex (Figure S7C). Would it be possible to estimate the relative density of axons projecting to the POm versus those projecting to S1? In addition, the authors should discuss the potential functional role of this direct pathway from the ZI to the S1 cortex.

    3. Reviewer #2 (Public review):

      Summary:

      The authors examined long-distance influence of climbing fiber (CF) signaling in the somatosensory cortex by manipulating whiskers through stimulation. Also, they examined CF signaling using two-photon imaging and mapped projections from the cerebellum to the somatosensory cortex using transsynaptic tracing. As a final manipulation, they used chemogenetics to perturb parvalbumin-positive neurons in the zona incerta and recorded from climbing fibers.

      Strengths:

      There are several strengths to this paper. The recordings were carefully performed, and AAVs used were selective and specific for the cell types and pathways being analyzed. In addition, the authors used multiple approaches that support climbing fiber pathways to distal regions of the brain. This work will impact the field and describes nice methods to target difficult-to-reach brain regions, such as the inferior olive.

      Weaknesses:

      There are some details in the methods that could be explained further. The discussion was very short and could connect the findings in a broader way.

    4. Reviewer #3 (Public review):

      Summary:

      The authors developed an interesting novel paradigm to probe the effects of cerebellar climbing fiber activation on short-term adaptation of somatosensory neocortical activity during repetitive whisker stimulation. Normally, RWS potentiated whisker responses in pyramidal cells and weakly suppressed them in interneurons, lasting for at least 1h. Crusii Optogenetic climbing fiber activation during RWS reduced or inverted these adaptive changes. This effect was generally mimicked or blocked with chemogenetic SST or VIP activation/suppression as predicted based on their "sign" in the circuit.

      Strengths:

      The central finding about CF modulation of S1 response adaptation is interesting, important, and convincing, and provides a jumping-off point for the field to start to think carefully about cerebellar modulation of neocortical plasticity.

      Weaknesses:

      The SST and VIP results appeared slightly weaker statistically, but I do not personally think this detracts from the importance of the initial finding (if there are multiple underlying mechanisms, modulating one may reproduce only a fraction of the effect size). I found the suggestion that zona incerta may be responsible for the cerebellar effects on S1 to be a more speculative result (it is not so easy with existing technology to effectively modulate this type of polysynaptic pathway), but this may be an interesting topic for the authors to follow up on in more detail in the future.

    1. eLife Assessment

      This valuable manuscript presents findings supported by solid data to identify a surprising glia-exclusive function for betapix in vascular integrity and angiogenesis. The manuscript also describes the optimisation of a modified CRISPR-based Zwitch approach to generate conditional knockouts in zebrafish

    2. Reviewer #1 (Public review):

      The manuscript by Chiu et al describes the modification of the Zwitch strategy to efficiently generate conditional knockouts of zebrafish betapix. They leverage this system to identify a surprising glia-exclusive function of betapix in mediating vascular integrity and angiogenesis. Betapix has been previously associated with vascular integrity and angiogenesis in zebrafish, and betapix function in glia has also been proposed. However, this study identifies glial betapix in vascular stability and angiogenesis for the first time.

      The study derives its strength from the modified CRISPR-based Zwitch approach to identify the specific role of glial betapix (and not neuronal, mural or endothelial). Using RNA-in situ hybridisation and analysis of scRNA-Seq data, they also identify delayed maturation of neurons and glia and implicate a reduction in stathmin levels in the glial knockouts in mediating vascular homeostasis and angiogenesis. The study also implicates a betapix-zfhx3/4-vegfa axis in mediating cerebral angiogenesis.

      There is both technical (the generation of conditional KOs) and knowledge-related (the exclusive role of glial betapix in vascular stability/angiogenesis) novelty in this work that is going to benefit the community significantly.

      However, the study has the following major weaknesses:

      (1) The lack of glia-specific rescue of betapix in the global KOs/mutants prevents the study from making a compelling case for the unexpected glial-specific function in vascular development and stability.

      (2) Given the known splice-isoform specific function of betapix in haemorrhaging (Liu et al, 2007), at least an expression profile of the isoforms in glia at the relevant timepoints would have further underscored betapix function.

      (3) Direct evidence of the status of endothelial cell proliferation/survival deficits, if any, in the glial betapix KOs would have provided a key mechanistic handle. It becomes all the more relevant as Liu et al, 2012 have demonstrated reduced proliferation of endothelial cells in bbh fish and linked it to deficits in angiogenesis.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The manuscript by Chiu et al describes the modification of the Zwitch strategy to efficiently generate conditional knockouts of zebrafish betapix. They leverage this system to identify a surprising glia-exclusive function of betapix in mediating vascular integrity and angiogenesis. Betapix has been previously associated with vascular integrity and angiogenesis in zebrafish, and betapix function in glia has also been proposed. However, this study identifies glial betapix in vascular stability and angiogenesis for the first time.

      The study derives its strength from the modified CRISPR-based Zwitch approach to identify the specific role of glial betapix (and not neuronal, mural, or endothelial). Using RNA-in situ hybridization and analysis of scRNA-Seq data, they also identify delayed maturation of neurons and glia and implicate a reduction in stathmin levels in the glial knockouts in mediating vascular homeostasis and angiogenesis. The study also implicates a betapix-zfhx3/4-vegfa axis in mediating cerebral angiogenesis.

      There is both technical (the generation of conditional KOs) and knowledge-related (the exclusive role of glial betapix in vascular stability/angiogenesis) novelty in this work that is going to benefit the community significantly.

      While the text is well written, it often elides details of experiments and relies on implicit understanding on the part of the reader. Similarly, the figure legends are laconic and often fail to provide all the relevant details.

      Thanks for this reviewer on his/her overall supports on our manuscript. We have now revised the manuscript text and figure legends making them to have all relevant details as much as we can. 

      Specific comments:

      (1) While the evidence from cKO's implicating glial betapix in vascular stability/angiogenesis is exciting, glia-specific rescue of betapix in the global KOs/mutants (like those performed for stathmin) would be necessary to make a water-tight case for glial betapix.

      We fully agree with the reviewer that it would be ideal to examine glia-specific rescue of betaPix in its global KOs. At the same time, it is difficult to achieve optimal transient expression of betaPix by injecting plasmid clone of gfap:betaPix while it takes long time to establish stable transgenic line gfap:betaPix for rescuing mutant phenotypes. We would like to pursue this line of researches in the future.

      (2) Splice variants of betapix have been shown to have differential roles in haemorrhaging (Liu, 2007). What are the major glial isoforms, and are there specific splice variants in the glial that contribute to the phenotypes described?

      We agree that it would be important to address whether any specific splice variants in glia contribute to betaPix mutant phenotypes. Previous studies have shown that the isoform a of betaPix is ubiquitously expressed across various tissues, while isoforms b, c, and d are predominantly expressed in the nervous system. In mice, the expression level of isoform betaPix-d is essential for the neurite outgrowth and migration. In the nervous system, we have not assessed glial specific betaPix isoforms directly. Our current data cannot rule out whether specific isoform is involved in its function in glial responses. The Zwitch cassette of betaPix resides on intron 5, thus disrupting all transcripts when Cre is activated. However, we are fully aware of the potential of identifying glial betaPix isoform with direct downstream targets. Further studies to dissect their roles in cerebral vascular development and diseases are part of our future plans.

      (3) Liu et al, 2012 demonstrated reduced proliferation of endothelial cells in bbh fish and linked it to deficits in angiogenesis. Are there proliferation/survival defects in endothelial cells in the glial KOs?

      We thank the reviewer for highlighting endothelial cell phenotypes in betaPix mutants. We are aware of endothelial cells might directly link to the mutant defects in angiogenesis. We assessed and quantified endothelial migration by measuring the length of developing central arteries, but we did not examine endothelial cell proliferation/survival defects in glial KOs. In our scRNA-seq analysis, the proportion of endothelial cells reduced among betaPix deficiency, indicating that endothelial cell proliferation/survival might decrease in mutants. In this endothelial cell cluster, we found disrupted transcriptional landscape in a set of angiogenic associated genes (Figure 6M). While these analysis highlights altered angiogenic transcriptome profile in endothelial cells of betaPix knockouts, we acknowledge that our study does not directly address proliferation/survival phenotypes in endothelial cells, which warrants future investigations on the role of betaPix in regulating glia-endothelial cell interaction.  

      Reviewer #2 (Public review):

      Summary:

      Using a genetic model of beta-pix conditional trap, the authors are able to regulate the spatio-temporal depletion of beta-pix, a gene with an established role in maintaining vascular integrity (shown elsewhere). This study provides strong in vivo evidence that glial beta-pix is essential to the development of the blood-brain barrier and maintaining vascular integrity. Using genetic and biochemical approaches, the authors show that PAK1 and Stathmins are in the same signaling axis as beta-pix, and act downstream to it, potentially regulating cytoskeletal remodeling and controlling glial migration. How exactly the glial-specific (beta-pix driven-) signaling influences angiogenesis or vascular integrity is not clear.

      Strengths:

      (1) Developing a conditional gene-trap genetic model which allows for tracking knockin reporter driven by endogenous promoter, plus allowing for knocking down genes. This genetic model enabled the authors to address the relevant scientific questions they were interested in, i.e., a) track expression of beta-pix gene, b) deletion of beta-pix gene in a cell-specific manner.

      (2) The study reveals the glial-specific role of beta-pix, which was unknown earlier. This opens up avenues for further research. (For instance, how do such (multiple) cell-specific signaling converge onto endothelial cells which build the central artery and maintain the blood-brain barriers?)

      We thank this reviewer for his/her overall supports on our work.

      Weaknesses:

      Major:

      (1) The study clearly establishes a role of beta-pix in glial cells, which regulates the length of the central artery and keeps the hemorrhages under control. Nevertheless, it is not clear how this is accomplished.

      (a) Is this phenotype (hemorrhage) a result of the direct interaction of glial cells and the adjacent endothelial cells? If direct, is the communication established through junctions or through secreted molecules?

      Thanks for this critical question. We attempted to address this issue by performing live imaging using light-sheet confocal microscopy, but failed to achieve sub-cellular resolution. We don’t have data to address this critical issue that warrants future investigations. 

      (b) The authors do not exclude the possibility that the effects observed on endothelial cells (quantified as length of central artery) could be secondary to the phenotype observed with deletion of glial beta-pix. For instance, can glial beta-pix regulate angiogenic factors secreted by peri-vascular cells, which consequently regulate the length of the central artery or vascular integrity?

      Thank the reviewer for this critical point. While we found the major defects of endothelial cell migration quantified by the central artery length, could not rule out the participation of signals from other peri-vascular cells. We fully agree that it will be important to address the cell-type specific relationship by angiogenic factors. Of note, degradation of extracellular matrix and focal adhesion is critical for the hemorrhagic phenotypes of bbh mutants. In a previous published study in our group, we found that suppressing the globally induced MEK/ERK/MMP9 signaling in bbh mutants significantly decreases hemorrhages. Accordingly, we edited a paragraph in the Discussion section on pages 24-25. We plan to continue investigating whether the complex interactions in the perivascular space contribute to vascular integrity disruption, as well as the cross-talks among different cell types during vascular development in these mutants. We believe that our model of glial specific betaPix function will guide us to further study cellular interactions in the follow-up studies.

      (c) The pictorial summary of the findings (Figure 7) does not include Zfhx or Vegfa. The data do not provide clarity on how these molecules contribute (directly or indirectly) to endothelial cell integrity. Vegfaa is expressed in the central artery, but the expression of the receptor in these endothelial cells is not shown. Similarly, all other experimental analyses for Zfhx and Vegfa expression were performed in glial cells. More experimental evidence is necessary to show the regulation of angiogenesis (of endothelial cells) by glial beta-pix. Is the Vegfaa receptor present on central arteries, and how does glial depletion of beta-pix affect its expression or response of central artery endothelial cells (both pertaining to angiogenesis and vascular integrity).

      Thank this reviewer for pointing out this critical issue. We have now revised the pictorial summary including Zfhx or Vegfa information in Figure 7. The key receptors of VEGF-A ligand are VEGFR-1 and VEGFR-2. In zebrafish, expression of Vegfr-2, as known as kdrl, is well-documented at endothelial cells including the hindbrain central arteries. We fully agree that it would indeed be of great value to assess changes of kdrl expression pattern after betaPix deficiency in vivo. It warrants future investigations to address how the VEGFA-VEGFR2 signaling in endothelial cells is altered in betaPix mutants.

      (2) Microtubule stabilization via glial beta-pix, claimed in Figure 5M, is unclear. Magnified images for h-betapix OE and h-stmn-1 glial cells are absent. Is this migration regulated by beta-pix through its GEF activity for Cdc42/Rac?

      We have now revised Figure 5M to include magnified images for h-betaPIX and h-STMN1 overexpression groups. It has been shown that there is a positive feedback loop of microtubule regulation consisting of Rac1-Pak1-Stathmin at the cell edge (Zeitz and Kierfeld, 2014 Biophys J.). Previous studies have shown betaPix activates Rac1 through its GEF activity and also regulates the activity of Pak1 via direct binding. As reported by Kwon et al., betaPix-d isoform promotes neurite outgrowth via the PAK-dependent inactivation of Stathmin1. In this work, we did not assess binding activity of betaPix to Rac1 or Pak1. Nevertheless, our data on the rescue experiments via IPA-3 suggest that betaPix deficiency impaired migration through Pak1 signaling. 

      (3) Hemorrhages are caused by compromised vascular integrity, which was not measured (either qualitatively or quantitatively) throughout the manuscript. The authors do measure the length of the central artery in several gene deletion models (2I, 3C. 5F/J, 6G/K), which is indicative of artery growth/ angiogenesis. How (if at all) defects in angiogenesis are an indication of hemorrhage should be explained or established. Do these angiogenic growth defects translate into junctional defects at later developmental time points? Formation and maintenance of endothelial cell junctions within the hemorrhaging arteries should be assessed in fish with deleted beta-pix from astrocytes.

      We appreciate the reviewer’s point and agree that this is a key aspect we need to clarify. To address junctional defects in our model, we re-examined the scRNA-seq data and found mild downregulation of junction protein claudin-5a (cldn5a) levels in the transcriptome analysis of the endothelial cluster (Author response image 1). We agree in principle that single cell RNA sequencing findings should be validated by immunostaining. While we did not measure junctional defects directly in this work, we have previously reported comparable tight junction protein zonula occludens-1 (ZO1) expression between siblings and bbh mutants (Yang et al., 2017 Dis Model Mech). In zebrafish, functionally characterized blood brain barrier (BBB) is only identified after 3 dpf. The lack of mature BBB might be due to the immature status of barrier signature at this developmental stage. Hemorrhage phenotype occurred around 40 hpf, and hematomas would be almost completely absorbed at later stage since most mutants recover and survive to adulthood. Thus future studies are needed to address the junctional characteristics on the cellular and molecular level in later developmental stages of betaPix mutants.   

      Author response image 1.

      Violin plots showing cdh5, cldn5a, cldn5b and oclna expression levels in endothelial sub-cluster. ctrl, control siblings; ko, betaPix knockouts (CRISPR mutants); 1d or 2d, 1 or 2 days post fertilization.

      (4) More information is required about the quality control steps for 10X sequencing (Figure 4, number of cells, reads, etc.). What steps were taken to validate the data quality? The EC groups, 1 and 2-days post-KO are not visible in 4C. One appreciates that the progenitor group is affected the most 2 days post-KO. But since the effects are expected to be on the endothelial cell group as well (which is shown in in vivo data), an extensive analysis should be done on the EC group (like markers for junctional integrity, angiogenesis, mesenchymal interaction, etc.). Are Stathmins limited to glial cells? Are there indicators for angiogenic responses in endothelial cells?

      Thank the reviewer for these critical suggestions. The detailed statements about the quality control steps for 10X sequencing are now provided in the Materials and Methods section. We validate the data quality through multiple steps, including verification of the number of viable cells used in experiment, assessment of peak shapes and fragment sizes of scRNA-seq libraries, confirmation of sufficient cell counts and sequencing reads for data analyses, and implementation of stringent filtering steps to exclude low-quality cells. Stathmins expressions as shown in Violin plots in Figure 4E and stmn1a, stmn1b and stmn4l expressions in UMAP plots in Figure S6C. These expressions are not limited to glial cells but distributed more widely among zebrafish tissues. We would like to point out that despite the small amount, the endothelial cell clusters are presented in Figure 4C with color brown. The proportions of EC groups split by four sample are visualized in Figure S6B and shown significant reduction among betaPix knockouts at 2 dpf, which had similar trend as glial progenitors. In addition, gene ontology analysis identified a set of down-regulated angiogenic genes expression in endothelial cluster (Figure 6M). We realize our interpretation of endothelial cell phenotypes was not sufficiently clear in this work and have now added sentences to the manuscript text on pages 16-17. As noted above, future studies are needed to address how glial betaPix regulates endothelial cell and BBB function. 

      Reviewing Editor Comments:

      comments on your manuscript. Addressing comments 1-3 from Reviewer 1 and comment 1 and its subparts from Reviewer 2 (major weaknesses) will significantly improve the manuscript by reinforcing the cell autonomous requirement of betaPix and also gain mechanistic insights. In addition, extensive proofreading and editing of the text, as well as changes to the figure, figure legends, and the discussion as indicated by both reviewers, will improve the readability and clarity of this manuscript.

      Thanks for Reviewing Editor on his/her supports on this manuscript. As noted above, we are trying to address the reviewers’ comments using the data we obtained in this work, as well as our plans for future investigations. We have now made extensive proofreading and editing of manuscript text and figure legends for improving the readability and clarity of this manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) The Discussion is written like an introduction with very little engagement with the data generated in the manuscript. The role of betapix-Pak-stathmin and betapix-zfhx3/4-vegfaa is barely discussed and contextualised vis-à-vis the current knowledge in the field.

      We appreciate the reviewer’s critical comments regarding the Discussion section. We have now revised the manuscript text on pages 20-23 to address the role of betapix-Pak-stathmin and betapix-zfhx3/4-vegfaa axis with contributions from this work.

      (2) Line 145: "light sheet microscopy" - explain that this was only for experiments involving fluorescence. Currently, it reads as if the data presented in Figures 1D and E are also obtained via light sheet microscopy. E.g., the paragraph starting on line 139 does not say what line was imaged (and what it labels) to reach the conclusions reached. This detail is not there even in the associated figure legend. Similarly, line 153 discusses radial glia, but there is no indication that these were labelled using Tg (GFAP:GFP) except in the figure annotation. There are various instances of such omissions throughout the text, and they should be remedied to indicate what each line is and what it labels, at least in the first instance.

      Thank the reviewer for their thoughtful points. In this revised version, we have incorporated more statements of the objectives and methodologies in the text in pages 8-9. We hope that the revised manuscript can better present the data with clarifying methodologies and materials used in this work. 

      (3) Figure 1E legend: What is the haemorrhage percentage? Is it the number of embryos per experiment showing hemorrhage? Indicate in the text. In the right panel, what is the number of embryos used? Please ensure all numbers (number of embryos, experiments, etc) used to plot any data in the set of figures in the entire manuscript are clearly indicated.

      Thank the reviewer for the suggestion. In this revised version, we have incorporated more detailed statements in figures and figure legends in the manuscript to show the numbers of embryos used.

      (4) The Discussion section suddenly introduces the blood-brain barrier and extensively discusses it. However, while cerebral haemorrhage can disrupt the BBB and exacerbate the effects of the haemorrhage, this manuscript does not suggest that a weakened BBB is the cause of haemorrhages in betapix mutants. More likely, betapix stabilises and maintains vascular integrity, and loss of this function causes haemorrhaging and subsequent disruption of the BBB. The glial function noted in this study is likely to be distinct from the glial function in BBB development and maintenance. The authors do not show any direct evidence for the latter. These should be shortened, and only relevant aspects facilitating contextualisation of data generated in this manuscript should be retained.

      We have now revised the Discussion section to reduce the introduction of blood-brain barrier and add statements according to the suggestions from both reviewers. We hope that the revisions provide a more relevant and balanced discussion.

      (5) Is the scratch assay in Figure 5 controlled for differences in cell proliferation among the different manipulations?

      We plated the same numbers of cells and cultured them in the same condition. Before conducted scratch assay we replaced medium with serum-free culture medium to reduce the effect from cell proliferation among the different manipulation groups. 

      (6) In the glioblastoma experiments involving betapix KD, does stathmin RNA/protein decrease? What about Ser 16 phosphorylation (as shown for neurons in Kwon et al, 2020)?

      STMN1 RNA was down-regulated by betaPIX deficiency, which was rescued by betaPIX overexpression in glial cells (Author response image 2). These results are similar to those from in vivo analysis (Figure 5A, 5B and S7A). We agree with the reviewer that it would been ideal to examine Ser 16 phosphorylation of Stathmin in our models. However, we believe that our data have established Stathmins function downstream to betaPix.

      Author response image 2.

      qRT-PCR analysis showing that betaPIX over-expression (betaPix OE) rescued STMN1 expression in betaPIX siRNA knockdown (betaPix KD) in U251 cells. Data are presented in mean ± SEM; one-way ANOVA analysis with Dunnett's test, individual P values mentioned in the figure

      (7) How was the rescue of betapix in glioblastoma cells with siRNA-mediated betapix knockdown performed? Is this by betapix-resistant cDNA? Further, no information about isoforms of betapix (both for siRNA-mediated KD and rescue) or stathmin is provided.

      As similar to our Zwitch method that disrupting all betaPix transcripts in vivo, the knockdown of human betaPIX were designed to target conserved region of all transcripts in glioblastoma cell lines. And the rescue human betaPIX were obtained from the U251 cDNA library, ideally all isoforms enriched in the glioblastoma cell line would be isolated. The missing details are now provided in the Materials and Methods section, page 26. 

      (8) It is unclear what the authors' thoughts are on the decrease in stathmin observed and the functional outcome of this decrease. The Discussion could benefit from this.

      Thanks. We have now incorporated a new paragraph in the Discussion section at pages 21-22 addressing that down-regulated expression of Stathmins is associated with functional outcome of this decrease.

      (9) Zfhx4 mRNA injection is performed on bbh and betapixKO (is this a global or glial KO?) and found to rescue haemorrhaging. While vegfaa mRNA increases, it is formally possible that the rescue is not due to the increase in vegfaa (or that vegfaa is sufficient). Injection of vegfaa mRNA could address this issue.

      Zfhx4 mRNA injection was performed on bbh mutants and global betapix knockouts (crispr mutants). To avoid confusion, we have now included a sentence highlighting global knockout mutants used for this rescue experiment. For the second part, we acknowledge that this study cannot definitively prove the necessity of increased vegfaa levels in the rescue experiment. However, our data established Zhfx3/4 as novel downstream effectors to betaPix in cerebral vessel development. And these effects might partly be linked to angiogenic responses regulated by Zhfx3/4. In this revised version, we carefully proposed that Vegfaa signals act downstream of betaPix-Zfhx3/4 axis and highlighted the weakness of our manuscript on not fully investigating sufficiency of Vegfaa in the Discussion section at page 24. We intend to pursue more extensive analysis in our follow-up studies.

      (10) A significant part of the manuscript looks at angiogenesis/vascularisation, however, the title of the paper only reflects vessel integrity (which can be distinct from angiogenesis).

      Thanks. We have now changed the title to: Glial betaPix is essential for blood vessel development in the zebrafish brain

      (11) Line 366: The BBB abbreviation is used without indicating the full form. Perhaps this can be introduced in the preceding sentence.

      We have now edited the following sentence: “The maturation hallmark of central nervous system (CNS) vasculature is acquisition of blood brain barrier (BBB) properties, establishing a stable environment ...” in lines 386-387, Discussion section.

      (12) Line 371: "rupture" and not "rapture".

      We thank the reviewer for pointing out the spelling error, and have now made this correction. 

      (13) Line 416: "is enriched" instead of "enriches"?

      We have now edited as: “...end feet that is enriched with aquaporin-4 ...” in line 411, page 19. 

      (14) The sentence in lines 121-123 should be simplified.

      We have now revised this sentence as the following: “A previous work has shown that bubblehead (bbh<sup>fn40a</sup>) mutant has a global reduction in betaPix transcripts, and bbh<sup>m292</sup> mutant has a hypomorphic mutation in betaPix, thus establishing that betaPix is responsible for bubblehead mutant phenotypes [10]”. 

      (15) No mention in the text of what o-dianisine labels.

      We have now edited the following sentence: “By using o-dianisidine staining to label hemoglobins, we found severe brain hemorrhages ...” in lines 131-133.

      (16) Line 165: Sentence requires improvement. Perhaps "Vascularisation of the central arteries in the zebrafish hindbrain ...".

      We have now edited this sentence as: “Vascularisation of the central arteries in the zebrafish hindbrain starts at 29 hpf.” in this revised version (line 176). 

      (17) Line 184: Why is "hematopoiesis" mentioned? The genesis of blood cells is not tested anywhere in the manuscript.

      Thanks. We have now edited this statement as: “IPA-3 treatment had no effect on heamorrhage induction in betaPix<sup>ct/ct</sup> control siblings.” 

      (18) Line 222-223: Improve "increasing trends". Perhaps "increased relative proportions". Clarify "progenitors" means neuronal and glial progenitors.

      We have now edited this statement: “we found that most neuronal clusters increased relative proportions ...” in this revised version.

      (19) Line 232-233: "arrow indicates" - perhaps "indicated by the arrow"? Also, the arrow indicating gfap needs to be mentioned in the Figure S6A legend. Cannot understand what is meant by "as of its enriched gfap".

      We have now edited in the text as: “Figure S6A, indicated by the arrow”, and added “Box area and arrow highlighting gfap expressions.” in Figure S6 legend. To avoid confusion, we have revised "as of its enriched gfap" sentence as the following: “We next focused on the progenitor cluster owing to the enriched gfap expression and the significantly reduced numbers of cells in this cluster by betaPix deficiency.”

      (20) Line 239 - 240: While the sentence says "... revealed three major categories:", well, more than 3 are mentioned subsequently.

      To avoid possible confusion in the text, we have now removed the sub-category examples and presented the data as: “three major categories: epigenetic remodeling, microtubule organizations and neurotransmitter secretion/transportation (Figure 4D).” 

      (21) Line 252: Stathmins negatively regulate microtubule stability. Why are they referred to as "microtubule polymerization genes stathmins"?

      We are thankful to the reviewer for pointing out this error, and we have now made correction in the text as “microtubule-destabilizing protein Stathmins”.

      (22) Line 262-265: The citation used to indicate concurrence with mouse data is disingenuous. That study did not show a reduction in stathmin levels upon betapix loss. Rather, it showed an increase in Ser16 phosphorylation on stathmin, which reduces stathmin's microtubule destabilising function. Please elaborate on the difference between the two studies.

      We completely agree with the reviewer’s statement that in the cited article, increased Ser16 phosphorylation on stathmin reduces its microtubule destabilising function. While that study did not show a reduction in Stathmin levels, others have shown that transcriptionally downregulated Stathmins are associated with the impaired neuronal and glial development. We have now revised the Discussion section by adding a new paragraph to address the disrupted homeostasis of Stathmins in these previous studies and their possible association with our data. We hope that these changes we made can clarify this issue. 

      (23) Line 310: While ZFHX3 levels are reduced in betapix mutants and KD in glioblastomas, were ZFHX3 and 4 up- or downregulated in the scRNA-Seq data?

      Thanks for this critical point. Indeed, our results showed that ZFHX3 and 4 down-regulated in the glial progenitor cluster in the scRNA-Seq data (Figure S8A) in betaPix knockouts and the FACS-sorted glia cells (Figure S8B). 

      (24) Line 317: "... betaPix acts upstream to Zfhx3/4-VEGFA signaling in regulating angiogenesis ...". While this is established later, the data at the time of this sentence does not warrant this claim.

      We agree with the reviewer’s statement and restated this sentence in the following way: “Zfhx3/4 might act as downstream effector of betaPix.”

      Reviewer #2 (Recommendations for the authors):

      (1) The images shown in 2E/H, 3B, 6F/J can use a schematic that helps readers to understand what to expect or look for. Splitting up the channels may also help in visualizing the vasculature clearly.

      Thank the reviewer for these suggestions. In this revised version, we have included schematic diagrams in the figures and incorporated more detailed statements in the legends.

      (2) Many times, arrows are pointing to structures (2E/H, 3B), but are not explained clearly (neither in the text nor in the legends). In 3B, the arrow is pointing to a negative space.

      (3) Legends are minimalistic and do not provide much information. The reader is left to interpret the data on their own.

      We apologize for not explaining the figures in enough details. In this revised version, we have now incorporated more detailed statements in the figure legends and have adjusted arrows in all figures.

      (4) The text needs heavy proofreading. For example:

      (a) Line 208- the title does not seem appropriate since the following text does not discuss Stathmins at all, which comes later.

      We agree with the reviewer’s statement and restated the title in the following way: “Single-cell transcriptome profiling reveals that gfap-positive progenitors were affected in betaPix knockouts.”

      (b) There is no mention of Figure 7 throughout the text.

      (c) Figure 7 does not include Zfhx or Vegfaa.

      Thank the reviewer for pointing out these errors. We have now revised Figure 7 and incorporated it to corresponding paragraphs in the Discussion section. 

      (5) The discussion seems incoherent in its current state.

      We have now revised the Discussion section according to the suggestions from both reviewers. We hope these revisions adequately address your concerns.

      (6) Please include some of the following points, if possible, in the discussion.

      (a) How is GEF activity of Rac/Cdc42 expected to be affected in beta-pix KO fishes?

      (b) What are the possible different ways the angiogenic pathways merge onto endothelial cells? Or do the authors imagine this process to be entirely driven by glial cells (directly)?

      We would like to thank the reviewer for his/her invaluable suggestions. We have now revised the Discussion section and hope that these changes can provide better and more balanced discussion. Since we have no data directly related to GEF activity of Rac/Cdc42 that might be affected in betaPix mutants, as well as have very limited data showing how glial betaPix regulates cerebral endothelial cells and BBB function, we would like to have the Discussion focused on the CRISPR-induced KI and cKO technologies, glial betaPix function and brain hemorrhage, and the putative role of betaPix-Zfhx3/4-VEGF function in central artery development. 

      References:

      Daub, H., Gevaert, K., Vandekerckhove, J., Sobel, A., and Hall, A. (2001). Rac/Cdc42 and p65PAK regulate the microtubule-destabilizing protein stathmin through phosphorylation at serine 16. J Biol Chem 276, 1677-1680. 10.1074/jbc.C000635200.

      Kim S, Park H, Kang J, Choi S, Sadra A, Huh SO. β-PIX-d, a Member of the ARHGEF7 Guanine Nucleotide Exchange Factor Family, Activates Rac1 and Induces Neuritogenesis in Primary Cortical Neurons. Exp Neurobiol. 2024;33(5):215-224. doi:10.5607/en24026

      Kwon Y, Jeon YW, Kwon M, Cho Y, Park D, Shin JE. βPix-d promotes tubulin acetylation and neurite outgrowth through a PAK/Stathmin1 signaling pathway [published correction appears in PLoS One. 2020 May 13;15(5):e0233327. doi: 10.1371/journal.pone.0233327.]. PLoS One. 2020;15(4):e0230814. Published 2020 Apr 6. doi:10.1371/journal.pone.0230814

      Kwon Y, Lee SJ, Shin YK, Choi JS, Park D, Shin JE. Loss of neuronal βPix isoforms impairs neuronal morphology in the hippocampus and causes behavioral defects. Anim Cells Syst (Seoul). 2025;29(1):57-71. Published 2025 Jan 8. doi:10.1080/19768354.2024.2448999

      Wittmann, T., Bokoch, G.M., and Waterman-Storer, C.M. (2004). Regulation of microtubule destabilizing activity of Op18/stathmin downstream of Rac1. J Biol Chem 279, 6196-6203.10.1074/jbc.M307261200.

      Zeitz, M., and Kierfeld, J. (2014). Feedback mechanism for microtubule length regulation by stathmin gradients. Biophys J 107, 2860-2871.10.1016/j.bpj.2014.10.056.

    1. eLife Assessment

      This paper addresses the significant question of quantifying epistasis patterns, which affect the predictability of evolution, by reanalyzing a recently published combinatorial deep mutational scan experiment. The findings are that epistasis is fluid, i.e. strongly background dependent, but that fitness effects of mutations are predictable based on the wild-type phenotype. However, these potentially interesting claims are inadequately supported by the analysis, because measurement noise is not accounted for, arbitrary cutoffs are used, and global nonlinearities are not sufficiently considered. If the results continue to hold after these major improvements in the analysis, they should be of interest to all biologists working in the field of fitness landscapes.

    2. Reviewer #1 (Public review):

      This paper describes a number of patterns of epistasis in a large fitness landscape dataset recently published by Papkou et al. The paper is motivated by an important goal in the field of evolutionary biology to understand the statistical structure of epistasis in protein fitness landscapes, and it capitalizes on the unique opportunities presented by this new dataset to address this problem.

      The paper reports some interesting previously unobserved patterns that may have implications for our understanding of fitness landscapes and protein evolution. In particular, Figure 5 is very intriguing. However, I have two major concerns detailed below. First, I found the paper rather descriptive (it makes little attempt to gain deeper insights into the origins of the observed patterns) and unfocused (it reports what appears to be a disjointed collection of various statistics without a clear narrative. Second, I have concerns with the statistical rigor of the work.

      (1) I think Figures 5 and 7 are the main, most interesting, and novel results of the paper. However, I don't think that the statement "Only a small fraction of mutations exhibit global epistasis" accurately describes what we see in Figure 5. To me, the most striking feature of this figure is that the effects of most mutations at all sites appear to be a mixture of three patterns. The most interesting pattern noted by the authors is of course the "strong" global epistasis, i.e., when the effect of a mutation is highly negatively correlated with the fitness of the background genotype. The second pattern is a "weak" global epistasis, where the correlation with background fitness is much weaker or non-existent. The third pattern is the vertically spread-out cluster at low-fitness backgrounds, i.e., a mutation has a wide range of mostly positive effects that are clearly not correlated with fitness. What is very interesting to me is that all background genotypes fall into these three groups with respect to almost every mutation, but the proportions of the three groups are different for different mutations. In contrast to the authors' statement, it seems to me that almost all mutations display strong global epistasis in at least a subset of backgrounds. A clear example is C>A mutation at site 3.

      1a. I think the authors ought to try to dissect these patterns and investigate them separately rather than lumping them all together and declaring that global epistasis is rare. For example, I would like to know whether those backgrounds in which mutations exhibit strong global epistasis are the same for all mutations or whether they are mutation- or perhaps position-specific. Both answers could be potentially very interesting, either pointing to some specific site-site interactions or, alternatively, suggesting that the statistical patterns are conserved despite variation in the underlying interactions.

      1b. Another rather remarkable feature of this plot is that the slopes of the strong global epistasis patterns seem to be very similar across mutations. Is this the case? Is there anything special about this slope? For example, does this slope simply reflect the fact that a given mutation becomes essentially lethal (i.e., produces the same minimal fitness) in a certain set of background genotypes?

      1c. Finally, how consistent are these patterns with some null expectations? Specifically, would one expect the same distribution of global epistasis slopes on an uncorrelated landscape? Are the pivot points unusually clustered relative to an expectation on an uncorrelated landscape?

      1d. The shapes of the DFE shown in Figure 7 are also quite interesting, particularly the bimodal nature of the DFE in high-fitness (HF) backgrounds. I think this bimodality must be a reflection of the clustering of mutation-background combinations mentioned above. I think the authors ought to draw this connection explicitly. Do all HF backgrounds have a bimodal DFE? What mutations occupy the "moving" peak?

      1e. In several figures, the authors compare the patterns for HF and low-fitness (LF) genotypes. In some cases, there are some stark differences between these two groups, most notably in the shape of the DFE (Figure 7B, C). But there is no discussion about what could underlie these differences. Why are the statistics of epistasis different for HF and LF genotypes? Can the authors at least speculate about possible reasons? Why do HF and LF genotypes have qualitatively different DFEs? I actually don't quite understand why the transition between bimodal DFE in Figure 7B and unimodal DFE in Figure 7C is so abrupt. Is there something biologically special about the threshold that separates LF and HF genotypes? My understanding was that this was just a statistical cutoff. Perhaps the authors can plot the DFEs for all backgrounds on the same plot and just draw a line that separates HF and LF backgrounds so that the reader can better see whether the DFE shape changes gradually or abruptly.

      1f. The analysis of the synonymous mutations is also interesting. However I think a few additional analyses are necessary to clarify what is happening here. I would like to know the extent to which synonymous mutations are more often neutral compared to non-synonymous ones. Then, synonymous pairs interact in the same way as non-synonymous pair (i.e., plot Figure 1 for synonymous pairs)? Do synonymous or non-synonymous mutations that are neutral exhibit less epistasis than non-neutral ones? Finally, do non-synonymous mutations alter epistasis among other mutations more often than synonymous mutations do? What about synonymous-neutral versus synonymous-non-neutral. Basically, I'd like to understand the extent to which a mutation that is neutral in a given background is more or less likely to alter epistasis between other mutations than a non-neutral mutation in the same background.

      (2) I have two related methodological concerns. First, in several analyses, the authors employ thresholds that appear to be arbitrary. And second, I did not see any account of measurement errors. For example, the authors chose the 0.05 threshold to distinguish between epistasis and no epistasis, but why this particular threshold was chosen is not justified. Another example: is whether the product s12 × (s1 + s2) is greater or smaller than zero for any given mutation is uncertain due to measurement errors. Presumably, how to classify each pair of mutations should depend on the precision with which the fitness of mutants is measured. These thresholds could well be different across mutants. We know, for example, that low-fitness mutants typically have noisier fitness estimates than high-fitness mutants. I think the authors should use a statistically rigorous procedure to categorize mutations and their epistatic interactions. I think it is very important to address this issue. I got very concerned about it when I saw on LL 383-388 that synonymous stop codon mutations appear to modulate epistasis among other mutations. This seems very strange to me and makes me quite worried that this is a result of noise in LF genotypes.

    3. Reviewer #2 (Public review):

      Significance:

      This paper reanalyzes an experimental fitness landscape generated by Papkou et al., who assayed the fitness of all possible combinations of 4 nucleotide states at 9 sites in the E. coli DHFR gene, which confers antibiotic resistance. The 9 nucleotide sites make up 3 amino acid sites in the protein, of which one was shown to be the primary determinant of fitness by Papkou et al. This paper sought to assess whether pairwise epistatic interactions differ among genetic backgrounds at other sites and whether there are major patterns in any such differences. They use a "double mutant cycle" approach to quantify pairwise epistasis, where the epistatic interaction between two mutations is the difference between the measured fitness of the double-mutant and its predicted fitness in the absence of epistasis (which equals the sum of individual effects of each mutation observed in the single mutants relative to the reference genotype). The paper claims that epistasis is "fluid," because pairwise epistatic effects often differs depending on the genetic state at the other site. It also claims that this fluidity is "binary," because pairwise effects depend strongly on the state at nucleotide positions 5 and 6 but weakly on those at other sites. Finally, they compare the distribution of fitness effects (DFE) of single mutations for starting genotypes with similar fitness and find that despite the apparent "fluidity" of interactions this distribution is well-predicted by the fitness of the starting genotype.

      The paper addresses an important question for genetics and evolution: how complex and unpredictable are the effects and interactions among mutations in a protein? Epistasis can make the phenotype hard to predict from the genotype and also affect the evolutionary navigability of a genotype landscape. Whether pairwise epistatic interactions depend on genetic background - that is, whether there are important high-order interactions -- is important because interactions of order greater than pairwise would make phenotypes especially idiosyncratic and difficult to predict from the genotype (or by extrapolating from experimentally measured phenotypes of genotypes randomly sampled from the huge space of possible genotypes). Another interesting question is the sparsity of such high-order interactions: if they exist but mostly depend on a small number of identifiable sequence sites in the background, then this would drastically reduce the complexity and idiosyncrasy relative to a landscape on which "fluidity" involves interactions among groups of all sites in the protein. A number of papers in the recent literature have addressed the topics of high-order epistasis and sparsity and have come to conflicting conclusions. This paper contributes to that body of literature with a case study of one published experimental dataset of high quality. The findings are therefore potentially significant if convincingly supported.

      Validity:

      In my judgment, the major conclusions of this paper are not well supported by the data. There are three major problems with the analysis.

      (1) Lack of statistical tests. The authors conclude that pairwise interactions differ among backgrounds, but no statistical analysis is provided to establish that the observed differences are statistically significant, rather than being attributable to error and noise in the assay measurements. It has been established previously that the methods the authors use to estimate high-order interactions can result in inflated inferences of epistasis because of the propagation of measurement noise (see PMID 31527666 and 39261454). Error propagation can be extreme because first-order mutation effects are calculated as the difference between the measured phenotype of a single-mutant variant and the reference genotype; pairwise effects are then calculated as the difference between the measured phenotype of a double mutant and the sum of the differences described above for the single mutants. This paper claims fluidity when this latter difference itself differs when assessed in two different backgrounds. At each step of these calculations, measurement noise propagates. Because no statistical analysis is provided to evaluate whether these observed differences are greater than expected because of propagated error, the paper has not convincingly established or quantified "fluidity" in epistatic effects.

      (2) Arbitrary cutoffs. Many of the analyses involve assigning pairwise interactions into discrete categories, based on the magnitude and direction of the difference between the predicted and observed phenotypes for a pairwise mutant. For example, the authors categorize as a positive pairwise interaction if the apparent deviation of phenotype from prediction is >0.05, negative if the deviation is <-0.05, and no interaction if the deviation is between these cutoffs. Fluidity is diagnosed when the category for a pairwise interaction differs among backgrounds. These cutoffs are essentially arbitrary, and the effects are assigned to categories without assessing statistical significance. For example, an interaction of 0.06 in one background and 0.04 in another would be classified as fluid, but it is very plausible that such a difference would arise due to error alone. The frequency of epistatic interactions in each category as claimed in the paper, as well as the extent of fluidity across backgrounds, could therefore be systematically overestimated or underestimated, affecting the major conclusions of the study.

      (3) Global nonlinearities. The analyses do not consider the fact that apparent fluidity could be attributable to the fact that fitness measurements are bounded by a minimum (the fitness of cells carrying proteins in which DHFR is essentially nonfunctional) and a maximum (the fitness of cells in which some biological factor other than DHFR function is limiting for fitness). The data are clearly bounded; the original Papkou et al. paper states that 93% of genotypes are at the low-fitness limit at which deleterious effects no longer influence fitness. Because of this bounding, mutations that are strongly deleterious to DHFR function will therefore have an apparently smaller effect when introduced in combination with other deleterious mutations, leading to apparent epistatic interactions; moreover, these apparent interactions will have different magnitudes if they are introduced into backgrounds that themselves differ in DHFR function/fitness, leading to apparent "fluidity" of these interactions. This is a well-established issue in the literature (see PMIDs 30037990, 28100592, 39261454). It is therefore important to adjust for these global nonlinearities before assessing interactions, but the authors have not done this.

      This global nonlinearity could explain much of the fluidity claimed in this paper. It could explain the observation that epistasis does not seem to depend as much on genetic background for low-fitness backgrounds, and the latter is constant (Figure 2B and 2C): these patterns would arise simply because the effects of deleterious mutations are all epistatically masked in backgrounds that are already near the fitness minimum. It would also explain the observations in Figure 7. For background genotypes with relatively high fitness, there are two distinct peaks of fitness effects, which likely correspond to neutral mutations and deleterious mutations that bring fitness to the lower bound of measurement; as the fitness of the background declines, the deleterious mutations have a smaller effect, so the two peaks draw closer to each other, and in the lowest-fitness backgrounds, they collapse into a single unimodal distribution in which all mutations are approximately neutral (with the distribution reflecting only noise).<br /> Global nonlinearity could also explain the apparent "binary" nature of epistasis. Sites 4 and 5 change the second amino acid, and the Papkou paper shows that only 3 amino acid states (C, D, and E) are compatible with function; all others abolish function and yield lower-bound fitness, while mutations at other sites have much weaker effects. The apparent binary nature of epistasis in Figure 5 corresponds to these effects given the nonlinearity of the fitness assay. Most mutations are close to neutral irrespective of the fitness of the background into which they are introduced: these are the "non-epistatic" mutations in the binary scheme. For the mutations at sites 4 and 5 that abolish one of the beneficial mutations, however, these have a strong background-dependence: they are very deleterious when introduced into a high-fitness background but their impact shrinks as they are introduced into backgrounds with progressively lower fitness. The apparent "binary" nature of global epistasis is likely to be a simple artifact of bounding and the bimodal distribution of functional effects: neutral mutations are insensitive to background, while the magnitude of the fitness effect of deleterious mutations declines with background fitness because they are masked by the lower bound. The authors' statement is that "global epistasis often does not hold." This is not established. A more plausible conclusion is that global epistasis imposed by the phenotype limits affects all mutations, but it does so in a nonlinear fashion.

      In conclusion, most of the major claims in the paper could be artifactual. Much of the claimed pairwise epistasis could be caused by measurement noise, the use of arbitrary cutoffs, and the lack of adjustment for global nonlinearity. Much of the fluidity or higher-order epistasis could be attributable to the same issues. And the apparently binary nature of global epistasis is also the expected result of this nonlinearity.

    4. Reviewer #3 (Public review):

      Summary:

      The authors have studied a previously published large dataset on the fitness landscape of a 9 base-pair region of the folA gene. The objective of the paper is to understand various aspects of epistasis in this system, which the authors have achieved through detailed and computationally expensive exploration of the landscape. The authors describe epistasis in this system as "fluid", meaning that it depends sensitively on the genetic background, thereby reducing the predictability of evolution at the genetic level. However, the study also finds two robust patterns. The first is the existence of a "pivot point" for a majority of mutations, which is a fixed growth rate at which the effect of mutations switches from beneficial to deleterious (consistent with a previous study on the topic). The second is the observation that the distribution of fitness effects (DFE) of mutations is predicted quite well by the fitness of the genotype, especially for high-fitness genotypes. While the work does not offer a synthesis of the multitude of reported results, the information provided here raises interesting questions for future studies in this field.

      Strengths:

      A major strength of the study is its detailed and multifaceted approach, which has helped the authors tease out a number of interesting epistatic properties. The study makes a timely contribution by focusing on topical issues like the prevalence of global epistasis, the existence of pivot points, and the dependence of DFE on the background genotype and its fitness. The methodology is presented in a largely transparent manner, which makes it easy to interpret and evaluate the results.

      The authors have classified pairwise epistasis into six types and found that the type of epistasis changes depending on background mutations. Switches happen more frequently for mutations at functionally important sites. Interestingly, the authors find that even synonymous mutations in stop codons can alter the epistatic interaction between mutations in other codons. Consistent with these observations of "fluidity", the study reports limited instances of global epistasis (which predicts a simple linear relationship between the size of a mutational effect and the fitness of the genetic background in which it occurs). Overall, the work presents some evidence for the genetic context-dependent nature of epistasis in this system.

      Weaknesses:

      Despite the wealth of information provided by the study, there are some shortcomings of the paper which must be mentioned.

      (1) In the Significance Statement, the authors say that the "fluid" nature of epistasis is a previously unknown property. This is not accurate. What the authors describe as "fluidity" is essentially the prevalence of certain forms of higher-order epistasis (i.e., epistasis beyond pairwise mutational interactions). The existence of higher-order epistasis is a well-known feature of many landscapes. For example, in an early work, (Szendro et. al., J. Stat. Mech., 2013), the presence of a significant degree of higher-order epistasis was reported for a number of empirical fitness landscapes. Likewise, (Weinreich et. al., Curr. Opin. Genet. Dev., 2013) analysed several fitness landscapes and found that higher-order epistatic terms were on average larger than the pairwise term in nearly all cases. They further showed that ignoring higher-order epistasis leads to a significant overestimate of accessible evolutionary paths. The literature on higher-order epistasis has grown substantially since these early works. Any future versions of the present preprint will benefit from a more thorough contextual discussion of the literature on higher-order epistasis.

      (2) In the paper, the term 'sign epistasis' is used in a way that is different from its well-established meaning. (Pairwise) sign epistasis, in its standard usage, is said to occur when the effect of a mutation switches from beneficial to deleterious (or vice versa) when a mutation occurs at a different locus. The authors require a stronger condition, namely that the sum of the individual effects of two mutations should have the opposite sign from their joint effect. This is a sufficient condition for sign epistasis, but not a necessary one. The property studied by the authors is important in its own right, but it is not equivalent to sign epistasis.

      (3) The authors have looked for global epistasis in all 108 (9x12) mutations, out of which only 16 showed a correlation of R^2 > 0.4. 14 out of these 16 mutations were in the functionally important nucleotide positions. Based on this, the authors conclude that global epistasis is rare in this landscape, and further, that mutations in this landscape can be classified into one of two binary states - those that exhibit global epistasis (a small minority) and those that do not (the majority). I suspect, however, that a biologically significant binary classification based on these data may be premature. Unsurprisingly, mutational effects are stronger at the functional sites as seen in Figure 5 and Figure 2, which means that even if global epistasis is present for all mutations, a statistical signal will be more easily detected for the functionally important sites. Indeed, the authors show that the means of DFEs decrease linearly with background fitness, which hints at the possibility that a weak global epistatic effect may be present (though hard to detect) in the individual mutations. Given the high importance of the phenomenon of global epistasis, it pays to be cautious in interpreting these results.

      (4) The study reports that synonymous mutations frequently change the nature of epistasis between mutations in other codons. However, it is unclear whether this should be surprising, because, as the authors have already noted, synonymous mutations can have an impact on cellular functions. The reader may wonder if the synonymous mutations that cause changes in epistatic interactions in a certain background also tend to be non-neutral in that background. Unfortunately, the fitness effect of synonymous mutations has not been reported in the paper.

      (5) The authors find that DFEs of high-fitness genotypes tend to depend only on fitness and not on genetic composition. This is an intriguing observation, but unfortunately, the authors do not provide any possible explanation or connect it to theoretical literature. I am reminded of work by (Agarwala and Fisher, Theor. Popul. Biol., 2019) as well as (Reddy and Desai, eLife, 2023) where conditions under which the DFE depends only on the fitness have been derived. Any discussion of possible connections to these works could be a useful addition.

    5. Author response:

      Thank you for sharing a detailed review of our manuscript titled, Variations and predictability of epistasis on an intragenic fitness landscape. We have now carefully gone through the reviewers’ and the editor’s comments and have the following preliminary responses.

      (1) Measurement noise in the folA fitness landscape. All three reviewers and the editors raise the important matter of incorporating measurement noise in the fitness landscape. The paper by Papkou and coworkers makes the fitness measurements of the landscape in six independent repeats. They show that the fitness data is highly correlated in each repeat, and use the weighted mean of the repeats to report their results. They do not study how measurement noise influences their findings. The results by Papkou and coworkers were our starting point, and hence, we built on the landscape properties reported in their study. As a result, we also analyse our results working with the same mean of the six independent measurements.

      The main result of the work by Papkou and coworkers is that largest subgraph in the landscape has 514 fitness peaks. 

      We revisit this result by quantifying how measurement noise changes this number. By doing this, we note the subgraph contains only 127 peaks which are statistically significant. We define a sequence as a peak when its corresponding fitness is greater than all its one-distance neighbours with a p-value < 0.05. This shows that, as pointed out in the reviews, incorporating noise in the landscape results significantly changes how we view the landscape – a facet not included in Papkou et al and the current version of our manuscript. 

      Not incorporating measurement noise means that the entire landscape has 4055 peaks. When measurement noise is included in the analysis, this number reduces to 137, out of which 136 are high fitness backgrounds (functional). 

      In the revised version of our manuscript, we will incorporate measurement noise in our analysis. Through this, we will also address the concern regarding the use of an arbitrary cut-off to study “fluid” epistasis. However, we note that arbitrary cut-offs to define DFEs have been recently used (Sane et al., PNAS, 2023).

      We also note that previous work with large scale landscapes (Wu et al, eLife, 2016) also reported a fitness landscape with a single experiment, with no repeats. 

      (2) Global nonlinearities and higher-order leading to fluid epistasis. Attempts at building models for higher-order epistasis from empirical data have largely been confined to landscapes of a limited data size. For example, Sailer & Harms, Genetics, 2017 propose models for higher-order epistasis from seven empirical data sets, each with less than a 100 data points. Another recent attempt (Park et al, Nat Comm, 2024) proposes rule for protein structure-function with 20 fitness landscapes. In this study, only one landscape which used fitness as a phenotype had ~160000 data points (of which only 42% were included for analysis). All other data sets which used fitness as a phenotype contained less than 10000 data points. While these statistical proposals of how higher-order epistasis operates exist, none of them are reliant of large scale, exhaustive network, like the one proposed by Papkou and coworkers.  

      In the edited manuscript, we will replace our arbitrary cut-off with results of statistical tests carried out based on measurement noise. 

      Global non-linearities shape evolutionary responses. We would like to emphasize that the goal of this work to study and understand how these global non-linearities result in patterns on a large fitness landscape by presenting the sum total of these fundamental factors in shaping statistical patterns. 

      While we understand that we may not have sufficiently explained the effects of global non-linearities on our results, we do not agree with the reviewer’s conclusion that our results are artifacts of these non-linearities. We will expand on the role of these nonlinearities on the patterns that we observe (like, fitness being bounded, as pointed out by reviewer 2, or differential impact of a mutation in functional vs. non-functional variants).

      We also speculate that changing our arbitrary cut-off (selection coefficient of 0.05) to measurement noise will not alter our results qualitatively. 

      The question we address in our work is, therefore, how does the nature of epistasis change with genetic background over a large, exhaustive landscape. The nature of epistasis between two mutations is analysed in all 4<sup>7</sup> backgrounds. The causative agents for the change in epistasis will be context-dependent, depending on the precise nature of the two mutations and the background. For instance, a certain background might simply introduce a Stop codon in the sequence. Notwithstanding these precise, local mechanistic explanations, we seek to answer how epistasis changes statistically in a sequence. Investigating statistical patterns which explain switch in nature of epistasis in deep, exhaustive landscapes is a long-term goal of this research.

      (3) Last, in our revised manuscript, we will address the reviewers’ other minor comments on the various aspects of the manuscript.

    1. eLife Assessment

      This valuable study introduces the peptidisc-TPP approach as a promising solution to challenges in membrane proteomics, enabling thermal proteome profiling in a detergent-free system. The concept is innovative and holds significant potential, and the demonstration of its utility and validation is solid. The method presents a strong foundation for broader applications in identifying physiologically and pharmacologically relevant membrane protein-ligand interactions.

    2. Reviewer #1 (Public review):

      Summary:

      The idea is appealing, but the authors have not sufficiently demonstrated the utility of this approach.

      Strengths:

      Novelty of the approach, potential implications for discovering novel interactions

      Comments on revisions:

      The authors have adequately addressed most of my concerns in this improved version of the manuscript

    3. Reviewer #2 (Public review):

      Summary:

      The membrane mimetic thermal proteome profiling (MM-TPP) presented by Jandu et al. promises a useful way to minimize the interference of detergents in efficient mass spectrometry analysis of membrane proteins. Thermal proteome profiling is a mass spectrometric method that measures binding of a drug to different proteins in a cell lysate by monitoring thermal stabilization of the proteins because of the interaction with the ligands that are being studied. This method has been underexplored for membrane proteome because of the inefficient mass spectrometric detection of membrane proteins and because of the interference from detergents that are used often for membrane protein solubilization.

      Strengths:

      In this report the binding of ligands to membrane protein targets has been monitored in crude membrane lysates or tissue homogenates exalting the efficacy of the method to detect both intended and off-target binding events in a complex physiologically relevant sample setting. The manuscript is lucidly written and the data presented seems clear. Kudos to the authors. This methodology shows immense potential for identifying membrane protein binders (small-molecule or protein) in a near-native environment, and as a result promises to be a great tool for drug discovery campaigns.

      Weaknesses:

      While this is a solid report and a promising tool for analyzing membrane protein drug interactions in a detergent-free environment, it is crucial to bear in mind that the process of reconstitution begins with detergent solubilization of the proteome and does not completely circumvent structural perturbations invoked by detergents.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      The idea is appealing, but the authors have not sufficiently demonstrated the utility of this approach.

      Strengths: 

      Novelty of the approach, potential impli=cations for discovering novel interactions

      Weaknesses:

      The Duong had introduced their highly elegant peptidisc approach several years ago. In this present work, they combine it with thermal proteome profiling (TPP) and attempt to demonstrate the utility of this combination for identifying novel membrane protein-ligand interactions.

      While I find this idea intriguing, and the approach potentially useful, I do not feel that the authors had sufficiently demonstrated the utility of this approach. My main concern is that no novel interactions are identified and validated. For the presentation of any new methodology, I think this is quite necessary. In addition, except for MsbA, no orthogonal methods are used to support the conclusions, and the authors rely entirely on quantifying rather small differences in abundances using either iBAQ or LFQ.

      We thank the reviewer for their thoughtful comments. In this revision, we have experimentally addressed the reviewer’s concerns in three ways:

      (1) To demonstrate the utility of our MM-TPP method over the detergent-based TPP workflow (termed DB-TPP), we performed a side-by-side comparison using ATP–VO₄ at 51 °C (Figure 3B and Figure 4A). From the DB-TPP dataset, 7.4% of all identified proteins were annotated as ATP-binding, while 6.4% of proteins differentially stabilized were annotated as ATP-binding. In contrast, in the MM-TPP dataset, 9.3% of all identified proteins were annotated as ATP-binding proteins, while 17% of proteins differentially stabilized were annotated as ATP-binding. The lack of enrichment in the detergent-based approach indicates that the observed differences are likely stochastic, rather than a result of specific ATP–VO₄-mediated stabilization as found with MM-TPP. For instance, several key proteins—BCS1, P2RY6, SLC27A2, ABCB1, ABCC2, and ABCC9— found differentially stabilized using the MM-TPP method showed no such pattern in the DB-TPP dataset. This divergence strongly supports the specificity and utility of our Peptidisc approach. 

      (2) To demonstrate that MM-TPP can resolve not only the broader effects of ATP–VO₄ but also specific ligand–protein interactions, we employed 2-methylthio-ADP (2-MeS-ADP), a selective agonist of the P2RY12 receptor [PMID: 24784220]. In that case, we observed clear thermal stabilization of P2RY12, with more than 6-fold increase in stability at both 51 °C and 57 °C (–log₁₀ p > 5.97; Figure 4B and Figure S4). Notably, no other proteins—including the structurally related but non-responsive P2RY6 receptor- showed comparable stabilization fold change at these temperatures.

      (3) To further probe the reproducibility of the method, we performed an independent MMTPP evaluation with ATP–VO₄ at 51 °C using data-independent acquisition (DIA), in contrast to the data-dependent acquisition (DDA) approach used in the initial study (Figure S5). Overall, 7.8% of all identified proteins were annotated as ATP-binding, and as before, this proportion increased to 17% among proteins with log₂ fold changes greater than 0.5. Specifically, BCS1 and SLC27A2 exhibited strong stabilization (log₂ fold change > 1), while P2RY6, ABCB11, ABCC2, and ABCG2 showed moderate stabilization (log₂ fold changes between 0.5 and 1), and consistent with previous results, P2RX4 was destabilized, with a log₂ fold change below –1. These findings support the consistency and reproducibility of the method across distinct data acquisition methods.

      My main concern is that no novel interactions are identified and validated. For the presentation of any new methodology, I think this is quite necessary.  

      The primary objective of our study is to establish and benchmark the MM-TPP workflow using known targets, rather than to discover novel ligand–protein interactions. Identifying new binders requires extensive screening and downstream validations, which we believe is beyond the scope of this methodological report. Instead, our study highlights the sensitivity and reliability of the MM-TPP approach by demonstrating consistent and reproducible results with well-characterized interactions.

      We respectfully disagree with the notion that introducing a new methodology must necessarily include the discovery of novel interactions. For instance, Martinez Molina et al. [PMID: 23828940] introduced the cellular thermal shift assay (CETSA) by validating established targets such as MetAP2 with TNP-470 and CDK2 with AZD-5438, without identifying novel protein–ligand pairs. Similarly, Kalxdorf et al. [PMID: 33398190] published their cell-surface thermal proteome profiling (CS-TPP) using Ouabain to stabilize the Na⁺/K⁺-ATPase pump in K562 cells, and SB431542 to stabilize its canonical target JAG1. In fact, when these methods revealed additional stabilizations, these were not validated but instead interpreted through reasoning grounded in the literature. For instance, they attributed the SB431542-induced stabilization of MCT1 to its reported role in cell migration and tumor invasiveness, and explained that SLC1A2 stabilization is related to the disruption of Na⁺/K⁺-ATPase activity by Ouabain. In the same way, our interpretation of ATP-VO₄–mediated stabilization of Mao-B is justified by predictive AlphaFold-3 rather than direct orthogonal assays, which are beyond the scope of our methodological presentation. 

      Collectively, the influential studies cited above have set methodological precedents by prioritizing validation and proof-of-concept over merely finding uncharacterized binders. In the same spirit, our work is centred on establishing MM-TPP as a robust platform for probing membrane protein–ligand interactions in a water-soluble format. The discovery of novel binders remains an exciting future direction—one that will build upon the methodological foundation laid by the present study.

      In addition, except for MsbA, no orthogonal methods are used to support the conclusions, and the authors rely entirely on quantifying rather small differences in abundances using either iBAQ or LFQ.

      We deliberately began this study with our model protein, MsbA, examined under both native and overexpressed conditions, to establish an adequation between MMTPP (Figure 2D) and biochemical stability assays (Figure 2A). This validation has provided us with the foundation to confidently extend MM-TPP to the mouse organ proteome. To demonstrate the validity of our workflow, we have used ATP-VO₄ because it has expected targets. 

      We note that orthogonal validation often requires overproduction and purification of the candidate proteins, including suitable antibodies, which is a true challenge for membrane proteins. Here, we demonstrate that MM-TPP can detect ligand-induced thermal shifts directly in native membrane preparations, without requiring protein overproduction or purification. We also emphasize several influential studies in TPP, including Martinez Molina et al. (PMID: 23828940) and Fang et al. (PMID: 34188175), which focused primarily on establishing and benchmarking the methodology, rather than on extensive orthogonal validation. In the same spirit, our study prioritizes methodological development, and accordingly, several orthogonal validations are now included in this revision.

      [...] and the authors rely entirely on quantifying rather small differences in abundances using either iBAQ or LFQ.

      To clarify, all analyses on ligand-induced stabilization or destabilization were carried out using LFQ values. The sole exception is on Figure 2B, where we used iBAQ values to depict the relative abundance of proteins within a single sample; this to show MsbA's relative level within the E. coli peptidisc library.

      Respectfully, we disagree with the assertion that we are “quantifying rather small differences in abundances using either iBAQ or LFQ.” We were able to clearly distinguish between stabilizations driven by specific ligands binding to their targets versus those caused by non-specific ligands with broader activity. This is further confirmed by comparing 2-MeS-ADP, a selective ligand for P2RY12, with ATP-VO₄, a highly promiscuous ligand, and AMP-PNP, which exhibits intermediate breadth. When tested in triplicate at 51 °C, 2-MeS-ADP significantly altered the thermal stability of 27 proteins,  AMP-PNP 44 proteins, and ATP-VO₄ 230 proteins, consistent with the expectation that broader ligands stabilize more proteins nonspecifically. Importantly, 2-MeS-ADP produced markedly stronger stabilization of its intended target, P2RY12 (–log<sub>10</sub>p = 9.32), than the top stabilized proteins for ATP–VO₄ (DNAJB3, –log₁₀p = 5.87) or AMP-PNP (FTH1, p = 5.34). Moreover, 2-MeS-ADP did not significantly stabilize proteins that were consistently stabilized by the broad ligands, such as SLC27A2, which was strongly stabilized by both ATP-VO<sub>4</sub> and AMP-PNP (–log<sub>10</sub> p>2.5). Together, these findings demonstrate that MMTPP can robustly distinguish between broad-spectrum and target-specific ligands, with selective ligands inducing stronger and more physiologically meaningful stabilization at their intended targets compared to promiscuous ligands.

      Finally, we emphasize that our findings are not marginal, but meet quantitative and statistical rigor consistent with best practices in proteomics. We apply dual thresholds combining effect size (|log₂FC| ≥ 1, i.e., at least a two-fold change) with statistical significance (FDR-adjusted p ≤ 0.05)—criteria commonly used in proteomics methodology studies (e.g., PMID: 24942700, 38724498). Moreover, the stabilization and destabilization events we report are reproducible across biological replicates (n = 3), consistent across adjacent temperatures for most targets, and technically robust across acquisition modes (DDA vs. DIA). Taken together, these results reflect statistically valid and biologically meaningful effects, fully aligned with standards set by prior published proteomics studies.

      Furthermore, the reported changes in abundances are solely based on iBAQ or LFQ analysis. This must be supported by a more quantitative approach such as SILAC or labeled peptides. In summary, I think this story requires a stronger and broader demonstration of the ability of peptidisc-TPP to identify novel physiologically/pharmacologically relevant interactions.

      With respect to labeling strategies, we deliberately avoided using TMT due to concerns about both cost and potential data quality issues. Some recent studies have documented the drawbacks of TMT in contexts directly relevant to our work. For example, a benchmarking study of LiP-MS workflows showed that although TMT increased proteome depth and reduced technical variance, it was less accurate in identifying true drug–protein interactions and produced weaker dose–response correlations compared with label-free DIA approaches [PMID: 40089063]. More broadly, technical reviews have highlighted that isobaric tagging is intrinsically prone to ratio compression and reporterion interference due to co-isolation and co-fragmentation of peptides, which flatten measured fold-changes and obscure biologically meaningful differences [PMID: 22580419, 22036744]. In terms of SILAC, the technique requires metabolic incorporation of heavy amino acids, which is feasible in cultured cells but not in physiologically relevant tissues such as the liver organ used here. SILAC mouse models exist, but they are expensive and time-consuming [PMID: 18662549, 21909926]. We are not a mouse lab, and introducing liver organ SILAC labeling in our workflow is beyond the scope of these revisions. We also note that several hallmark TPP studies have been successfully carried out using label-free quantification [PMID: 25278616, 26379230, 33398190, 23828940], establishing this as an accepted and widely applied approach in the field. 

      To further support our conclusions, we added controls showing that detergent solubilization of mouse liver membranes followed by SP4 cleanup fails to detect ATP-VO₄– mediated stabilization of ATP-binding proteins, underscoring the necessity of Peptidisc reconstitution for capturing ligand-induced thermal stabilization. We also present new data demonstrating selective stabilization of the P2Y12 receptor by its agonist 2-MeS-ADP, providing orthogonal, receptor-specific validation within the MM-TPP framework. Finally, an orthogonal DIA acquisition on separate replicates confirmed robust ATP-vanadate stabilization of ATP-binding proteins, including BCS1l and SLC27A2. Together, these additions reinforce that the observed stabilizations are genuine, physiologically relevant ligand–protein interactions and highlight the unique advantage of the Peptidisc-based workflow in capturing such events.

      Cited Reference:

      24784220: Zhang J, Zhang K, Gao ZG, et al. Agonist-bound structure of the human P2Y₁₂ receptor. Nature.  2014;509(7498):119-122. doi:10.1038/nature13288. 

      23828940: Martinez Molina D, Jafari R, Ignatushchenko M, et al. Monitoring drug target engagement in cells and tissues using the cellular thermal shift assay. Science. 2013;341(6141):84-87. doi:10.1126/science.1233606.

      33398190: Kalxdorf M, Günthner I, Becher I, et al. Cell surface thermal proteome profiling tracks perturbations and drug targets on the plasma membrane. Nat Methods. 2021;18(1):84-91. doi:10.1038/s41592-020-01022-1.

      34188175: Fang S, Kirk PDW, Bantscheff M, Lilley KS, Crook OM. A Bayesian semi-parametric model for thermal proteome profiling. Commun Biol. 2021;4(1):810. doi:10.1038/s42003-021-02306-8.

      24942700: Cox J, Hein MY, Luber CA, Paron I, Nagaraj N, Mann M. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol Cell Proteomics. 2014;13(9):2513-2526. doi:10.1074/mcp.M113.031591.

      38724498: Peng H, Wang H, Kong W, Li J, Goh WWB. Optimizing differential expression analysis for proteomics data via high-performing rules and ensemble inference. Nat Commun. 2024;15(1):3922. doi:10.1038/s41467-02447899-w. 

      40089063: Koudelka T, Bassot C, Piazza I. Benchmarking of quantitative proteomics workflows for limited proteolysis mass spectrometry. Mol Cell Proteomics. 2025;24(4):100945. doi:10.1016/j.mcpro.2025.100945.

      22580419: Christoforou AL, Lilley KS. Isobaric tagging approaches in quantitative proteomics: the ups and downs. Anal Bioanal Chem. 2012;404(4):1029-1037. doi:10.1007/s00216-012-6012-9. 

      22036744: Christoforou AL, Lilley KS. Isobaric tagging approaches in quantitative proteomics: the ups and downs. Anal Bioanal Chem. 2012;404(4):1029-1037. doi:10.1007/s00216-012-6012-9. 

      18662549: Krüger M, Moser M, Ussar S, et al. SILAC mouse for quantitative proteomics uncovers kindlin-3 as an essential factor for red blood cell function. Cell. 2008;134(2):353-364. doi:10.1016/j.cell.2008.05.033.

      21909926: Zanivan S, Krueger M, Mann M. In vivo quantitative proteomics: the SILAC mouse. Methods Mol Biol. 2012;757:435-450. doi:10.1007/978-1-61779-166-6_25. 

      25278616: Kalxdorf M, Becher I, Savitski MM, et al. Temperature-dependent cellular protein stability enables highprecision proteomics profiling. Nat Methods. 2015;12(12):1147-1150. doi:10.1038/nmeth.3651.

      26379230: Savitski MM, Reinhard FBM, Franken H, et al. Tracking cancer drugs in living cells by thermal profiling of the proteome. Science. 2015;346(6205):1255784. doi:10.1126/science.1255784. 

      33452728: Leuenberger P, Ganscha S, Kahraman A, et al. Cell-wide analysis of protein thermal unfolding reveals determinants of thermostability. Science. 2020;355(6327):eaai7825. doi:10.1126/science.aai7825. 

      23066101: Savitski MM, Zinn N, Faelth-Savitski M, et al. Quantitative thermal proteome profiling reveals ligand interactions and thermal stability changes in cells. Nat Methods. 2013;10(12):1094-1096. doi:10.1038/nmeth.2766.  

      30858367: Piazza I, Kochanowski K, Cappelletti V, et al. A machine learning-based chemoproteomic approach to identify drug targets and binding sites in complex proteomes. Nat Commun. 2019;10(1):1216. doi:10.1038/s41467019-09199-0. 

      Reviewer #2 (Public Review):

      Summary:

      The membrane mimetic thermal proteome profiling (MM-TPP) presented by Jandu et al. seems to be a useful way to minimize the interference of detergents in efficient mass spectrometry analysis of membrane proteins. Thermal proteome profiling is a mass spectrometric method that measures binding of a drug to different proteins in a cell lysate by monitoring thermal stabilization of the proteins because of the interaction with the ligands that are being studied. This method has been underexplored for membrane proteome because of the inefficient mass spectrometric detection of membrane proteins and because of the interference from detergents that are used often for membrane protein solubilization.

      Strengths:

      In this report the binding of ligands to membrane protein targets has been monitored in crude membrane lysates or tissue homogenates exalting the efficacy of the method to detect both intended and off-target binding events in a complex physiologically relevant sample setting.

      The manuscript is lucidly written and the data presented seems clear. The only insignificant grammatical error I found was that the 'P' in the word peptidisc is not capitalized in the beginning of the methods section "MM-TPP profiling on membrane proteomes". The clear writing made it easy to understand and evaluate what has been presented. Kudos to the authors.

      Weaknesses:

      While this is a solid report and a promising tool for analyzing membrane protein drug interactions, addressing some of the minor caveats listed below could make it much more impactful.

      The authors claim that MM-TPP is done by "completely circumventing structural perturbations invoked by detergents[1] ". This may not be entirely accurate, because before reconstitution of the membrane proteins in peptidisc, the membrane fractions are solubilized by 1% DDM. The solubilization and following centrifugation step lasts at least for 45 min. It is less likely that all the structural perturbations caused by DDM to various membrane proteins and their transient interactions become completely reversed or rescued by peptidisc reconstitution.

      We thank the reviewer for this insightful comment. In response, we have revised the sentence and expanded the discussion to clarify that the Peptidisc provides a complementary approach to detergent-based preparations for studying membrane proteins, preserving native lipid–protein interactions and stabilization effects that may be diminished in detergent.

      To further address the structural perturbations invoked by detergents, and as already detailed to our response to Reviewer 1, we have compared the thermal profile of the Peptidisc library to the mouse liver membranes solubilized with 1% DDM, after incubation with ATP–VO₄ at 51 °C (Figure 4A). The results with the detergent extract revealed random patterns of stabilization and destabilization, with only 6.4% of differentially stabilized proteins being ATP-binding—comparable to the 7.4% observed in the background. In contrast, in the Peptidisc library, 17% of differentially stabilized proteins were ATP-binding, compared to 9.3% in the background. Thus, while Peptidisc reconstitution does not fully avoid initial detergent exposure, these findings underscore the importance of implementing Peptidisc in the TPP workflow when dealing with membrane proteins.

      In the introduction, the authors make statements such as "..it is widely acknowledged that even mild detergents can disrupt protein structures and activities, leading to challenges in accurately identifying drug targets.." and "[peptidisc] libraries are instrumental in capturing and stabilizing IMPs in their functional states while preserving their interactomes and lipid allosteric modulators...'. These need to be rephrased, as it has been shown by countless studies that even with membrane protein suspended in micelles robust ligand binding assays and binding kinetics have been performed leading to physiologically relevant conclusions and identification of protein-protein and protein-ligand interactions.

      We thank the reviewer for this valuable feedback and fully agree with the point raised. In response, we have revised the Introduction and conclusion to moderate the language concerning the limitations of detergent use. We now explicitly acknowledge that numerous studies have successfully used detergent micelles for ligand-binding assays and kinetic analyses, yielding physiologically relevant insights into both protein–protein and protein–ligand interactions [e.g., PMID: 22004748, 26440106, 31776188].

      At the same time, we clarify that the Peptidisc method offers a complementary advantage, particularly in the context of thermal proteome profiling (TPP), which involves mass spectrometry workflows that are incompatible with detergents. In this setting, Peptidiscs facilitate the detection of ligand-binding events that may be more difficult to observe in detergent micelles.

      We have reframed our discussion accordingly to present Peptidiscs not as a replacement for detergent-based methods, but rather as a complementary tool that broadens the available methodological landscape for studying membrane protein interactions.

      If the method involves detergent solubilization, for example using 1% DDM, it is a bit disingenuous to argue that 'interactomes and lipid allosteric modulators' characterized by lowaffinity interactions will remain intact or can be rescued upon detergent removal. Authors should discuss this or at least highlight the primary caveat of the peptidisc method of membrane protein reconstitution - which is that it begins with detergent solubilization of the proteome and does not completely circumvent structural perturbations invoked by detergents.

      We would like to clarify that, in our current workflow, ligand incubation occurs after reconstitution into Peptidiscs. As such, the method is designed to circumvent the negative effects of detergent during the critical steps involving low-affinity interactions.

      That said, we fully acknowledge that Peptidisc reconstitution begins with detergent solubilization (e.g., 1% DDM), and we have revised the conclusion to explicitly state this important caveat. As the reviewer correctly points out, this initial step may introduce some structural perturbations or result in the loss of weakly associated lipid modulators.

      However, reconstitution into Peptidiscs rapidly restores a detergent-free environment for membrane proteins, which has been shown in our previous studies [PMID: 38577106, 38232390, 31736482, 31364989] to mitigate these effects. Specifically, we have demonstrated that time-limited DDM exposure, followed by Peptidisc reconstitution, minimizes membrane protein delipidation, enhances thermal stability, retains functionality, and preserves multi-protein assemblies.

      It would also be important to test detergents that are even milder than 1% DDM and ones which are harsher than 1% DDM to show that this method of reconstitution can indeed rescue the perturbations to the structure and interactions of the membrane protein done by detergents during solubilization step. 

      We selected 1% DDM based on our previous work [PMID: 37295717, 39313981,38232390], where it consistently enabled robust and reproducible solubilization for Peptidisc reconstitution. We agree that comparing milder detergents (e.g., LMNG) and harsher ones (e.g., SDC) would provide valuable insights into how detergent strength influences structural perturbations, and how effectively these can be mitigated by Peptidisc reconstitution. Preliminary data (not shown) from mouse liver membranes indicate broadly similar proteomic profiles following solubilization with DDM, LMNG, and SDC, although potential differences in functional activity or ligand binding remain to be investigated.

      Based on the methods provided, it appears that the final amount of detergent in peptidisc membrane protein library was 0.008%, which is ~150 uM. The CMC of DDM depending on the amount of NaCl could be between 120-170 uM.

      While we cannot entirely rule out the presence of residual DDM (0.008%) in the raw library, its free concentration may be lower than initially estimated. This is related to the formation of mixed micelles with the amphipathic peptide scaffold, which is supplied in excess during reconstitution. These mixed micelles are subsequently removed during the ultrafiltration step. Furthermore, in related work using His-tagged Peptidiscs [PMID: 32364744], we purified the library by nickel-affinity chromatography following a 5× dilution into a detergent-free buffer. Although this purification step reduced the number of soluble proteins, the same membrane proteins were retained, suggesting that any residual detergent does not significantly interfere with Peptidisc reconstitution. Supporting this, our MM-TPP assays on purified libraries (data not shown) consistently demonstrated stabilization of ATP-binding proteins (e.g., SLC27A2, DNAJB3), indicating that the observed ligand–protein interactions result from successful incorporation into Peptidiscs.

      Perhaps, to completely circumvent the perturbations from detergents other methods of detergentfree solubilization such as using SMA polymers and SMALP reconstitution could be explored for a comparison. Moreover, a comparison of the peptidisc reconstitution with detergent-free extraction strategies, such as SMA copolymers, could lend more strength to the presented method.

      We agree that detergent-free methods such as SMA polymers hold promise for membrane protein solubilization. However, in preliminary single-replicate experiments using SMA2000 at 51 °C in the presence of ATP–VO₄ (data not shown), we observed broad, non-specific stabilization effects. Of the 2,287 quantified proteins, 9.3% were annotated as ATP-binding, yet 9.9% of the 101 proteins showing a log₂ fold change >1 or <–1 were ATPbinding, indicating no meaningful enrichment. Given this lack of specificity and the limited dataset, we chose not to pursue further SMA experiments and have not included them here. However, in a recent study (https://doi.org/10.1101/2025.08.25.672181), we directly compared Peptidisc, SMA, and nanodiscs for liver membrane proteome profiling. In that work, Peptidisc outperformed both SMA and nanodiscs in detecting membrane protein dysregulation between healthy and diseased liver. By extension, we expect Peptidisc to offer superior sensitivity and specificity for detecting ligand-induced stabilization events, such as those observed here with ATP–vanadate.

      Cross-verification of the identified interactions, and subsequent stabilization or destabilizations, should be demonstrated by other in vitro methods of thermal stability and ligand binding analysis using purified protein to support the efficacy of the MM-TPP method. An example cross-verification using SDS-PAGE, of the well-studied MsbA, is shown in Figure 2. In a similar fashion, other discussed targets such as, BCS1L, P2RX4, DgkA, Mao-B, and some un-annotated IMPs shown in supplementary figure 3 that display substantial stabilization or destabilization should be cross-verified.

      We appreciate this suggestion and note that a similar point was raised in R1’s comment “In addition, except for MsbA, no orthogonal methods are used to support the conclusions, and the authors rely entirely on quantifying rather small differences in abundances using either iBAQ or LFQ.” We have developed a detailed response to R1 on this matter, which equally applies here. 

      Cited Reference:

      35616533: Young JW, Wason IS, Zhao Z, et al. Development of a Method Combining Peptidiscs and Proteomics to Identify, Stabilize, and Purify a Detergent-Sensitive Membrane Protein Assembly. J Proteome Res. 2022;21(7):1748-1758. doi:10.1021/acs.jproteome.2c00129. PMID: 35616533.

      31364989: Carlson ML, Stacey RG, Young JW, et al. Profiling the Escherichia coli membrane protein interactome captured in Peptidisc libraries. Elife. 2019;8:e46615. doi:10.7554/eLife.46615. 

      22004748: O'Malley MA, Helgeson ME, Wagner NJ, Robinson AS. Toward rational design of protein detergent complexes: determinants of mixed micelles that are critical for the in vitro stabilization of a G-protein coupled receptor. Biophys J. 2011;101(8):1938-1948. doi:10.1016/j.bpj.2011.09.018.

      26440106: Allison TM, Reading E, Liko I, Baldwin AJ, Laganowsky A, Robinson CV. Quantifying the stabilizing effects of protein-ligand interactions in the gas phase. Nat Commun. 2015;6:8551. doi:10.1038/ncomms9551.

      31776188: Beckner RL, Zoubak L, Hines KG, Gawrisch K, Yeliseev AA. Probing thermostability of detergentsolubilized CB2 receptor by parallel G protein-activation and ligand-binding assays. J Biol Chem. 2020;295(1):181190. doi:10.1074/jbc.RA119.010696.

      38577106: Jandu RS, Yu H, Zhao Z, Le HT, Kim S, Huan T, Duong van Hoa F. Capture of endogenous lipids in peptidiscs and effect on protein stability and activity. iScience. 2024;27(4):109382. doi:10.1016/j.isci.2024.109382.

      38232390: Antony F, Brough Z, Zhao Z, Duong van Hoa F. Capture of the Mouse Organ Membrane Proteome Specificity in Peptidisc Libraries. J Proteome Res. 2024;23(2):857-867. doi:10.1021/acs.jproteome.3c00825.

      31736482: Saville JW, Troman LA, Duong Van Hoa F. PeptiQuick, a one-step incorporation of membrane proteins into biotinylated peptidiscs for streamlined protein binding assays. J Vis Exp. 2019;(153). doi:10.3791/60661. 

      37295717: Zhao Z, Khurana A, Antony F, et al. A Peptidisc-Based Survey of the Plasma Membrane Proteome of a Mammalian Cell. Mol Cell Proteomics. 2023;22(8):100588. doi:10.1016/j.mcpro.2023.100588. 

      39313981: Antony F, Brough Z, Orangi M, Al-Seragi M, Aoki H, Babu M, Duong van Hoa F. Sensitive Profiling of Mouse Liver Membrane Proteome Dysregulation Following a High-Fat and Alcohol Diet Treatment. Proteomics. 2024;24(23-24):e202300599. doi:10.1002/pmic.202300599. 

      32364744: Young JW, Wason IS, Zhao Z, Rattray DG, Foster LJ, Duong Van Hoa F. His-Tagged Peptidiscs Enable Affinity Purification of the Membrane Proteome for Downstream Mass Spectrometry Analysis. J Proteome Res. 2020;19(7):2553-2562. doi:10.1021/acs.jproteome.0c00022.

      32591519: The M, Käll L. Focus on the spectra that matter by clustering of quantification data in shotgun proteomics. Nat Commun. 2020;11(1):3234. doi:10.1038/s41467-020-17037-3. 

      33188197: Kurzawa N, Becher I, Sridharan S, et al. A computational method for detection of ligand-binding proteins from dose range thermal proteome profiles. Nat Commun. 2020;11(1):5783. doi:10.1038/s41467-02019529-8. 

      26524241: Reinhard FBM, Eberhard D, Werner T, et al. Thermal proteome profiling monitors ligand interactions with cellular membrane proteins. Nat Methods. 2015;12(12):1129-1131. doi:10.1038/nmeth.3652. 

      23828940: Martinez Molina D, Jafari R, Ignatushchenko M, et al. Monitoring drug target engagement in cells and tissues using the cellular thermal shift assay. Science. 2013;341(6141):84-87. doi:10.1126/science.1233606. 

      32133759: Mateus A, Kurzawa N, Becher I, et al. Thermal proteome profiling for interrogating protein interactions. Mol Syst Biol. 2020;16(3):e9232. doi:10.15252/msb.20199232. 

      14755328: Dorsam RT, Kunapuli SP. Central role of the P2Y12 receptor in platelet activation. J Clin Invest. 2004;113(3):340-345. doi:10.1172/JCI20986. 

      Reviewer #1 (Recommendations for the authors):

      “The authors use iBAC or LFQ to compare across samples. This inconsistency is puzzling. As far as I know, LFQ should always be used when comparing across samples”

      As mentioned above, we use iBAQ only in Fig. 2B to illustrate within-sample relative abundance; all comparative analyses elsewhere use LFQ. We have updated the Fig. 2B legend to state this explicitly.

      We used iBAQ Fig. 2B as it provides a notion of protein abundance within a sample, normalizing the summed peptide intensities by the number of theoretically observable peptides. This normalization facilitates comparisons between proteins within the same sample, offering a clearer understanding of their relative molar proportions [PMID: 33452728]. LFQ, by contrast, is optimized for comparing the same protein across different samples. It achieves this by performing delayed normalization to reduce run-to-run variability and by applying maximal peptide ratio extraction, which integrates pairwise peptide intensity ratios across all samples to build a consistent protein-level quantification matrix [PMID: 24942700]. These features make LFQ more robust to missing values and technical variation, thereby enabling accurate detection of relative abundance changes in the same protein under different experimental conditions. This distinction is well supported by the proteomics literature: Smits et al. [PMID: 23066101] used iBAQ specifically to determine the relative abundance of proteins within one sample, whereas LFQ was applied for comparative analyses between conditions.

      “[Regarding Figure 2A] Why does the control also contain ATP-vanadate? Also, I am not aware of a commercially available chemical "ATP-VO4". I assume this is a mistake”

      The control condition in Figure 2A was mislabeled, and the figure has been corrected to remove this discrepancy. In our experiments, ATP and orthovanadate (VO<sub>4</sub>) were added together, and for simplicity this was annotated as “ATP-VO<sub>4</sub>.” 

      “[Regarding Figure 2B] What is the fold change in MsbA iBAQ values? It seems that the differences are quite small, and as such require a more quantitative approach than iBAQ (e.g SILAC or some other internal standard). In addition, what information does this panel add relative to 2C”

      The figure has been updated to clarify that the values shown are log₂transformed iBAQ intensities. Figures 2B and 2C are complementary: Figure 2B shows that in the control sample, MsbA’s peptide abundance decreases with temperatures (51, 56, and 61 °C) relative to the remaining bulk proteins. Figure 2C shows the specific thermal profiles of MsbA in control and ATP–vanadate conditions. To make this clearer, we have added a sentence to the Results section explaining the specific role of Figure 2B.

      Together, these panels indicate that the method can identify ligand-induced stabilization even for proteins whose abundance decreases faster than the bulk during the TPP assay. We have provided the rationale for not using SILAC or TMT labeling in our public response.

      “[Regarding Figure 2C] Although not mentioned in the legend, I assume this is iBAQ quantification, which as mentioned above isn't accurate enough for such small differences. In addition, I find this data confusing: why is MsbA more stable at the lower temperatures in the absence of ATP-vanadate? The smoothed-line representation is misleading, certainly given the low number of data points”

      The data presented represent LFQ values for MsbA, and we have updated the figure legend to clearly indicate this. Additionally, as suggested, we have removed the smoothing line to more accurately reflect the data. Regarding the reviewer’s concern about stability at lower temperatures, we note that MsbA exhibits comparable abundance at 38 °C and 46 °C under both conditions, with overlapping error bars. We therefore interpret these data as indicating no significant difference in stability at the lower temperatures, with ligand-dependent stabilization becoming apparent only at elevated temperatures. We do not exclude the possibility that MsbA stability at these temperatures is affected by the conformational dynamics of this ABC transporter upon ATP binding and hydrolysis.

      “[Regarding Figure 3A] is this raw LFQ data? Why did the authors suddenly change from iBAQ to LFQ? I find this inconsistency puzzling”

      To clarify, all analyses of protein stabilization or destabilization presented in the manuscript are based on LFQ values. The only instance where iBAQ was used is Figure 2B, where it served to illustrate the relative peptide abundance of MsbA within the same sample. We have revised the figure legends and text to make this distinction explicit and ensure consistency in presentation.

      “[Regarding Figure 3B] The non-specific ATP-dependent stabilization increases the likelihood of false positive hits. This limitation is not mentioned by the authors. I think it is important to show other small molecules, in addition to ATP. The authors suggest that their approach is highly relevant for drug screening. Therefore, a good choice is to test an effect of a known stabilizing drug (eg VX-809 and CFTR)”

      We thank the reviewer for this suggestion. As noted in the manuscript (results and discussion sections), ATP is a natural hydrotrope and is therefore expected to induce broad, non-specific stabilization effects, a phenomenon also observed in previous proteome-wide studies, which demonstrated ATP’s widespread influence on cytosolic protein solubility and thermal stability (PMID: 30858367). To demonstrate that MM-TPP can resolve specific ligand–protein interactions beyond these global ATP effects, we tested 2-methylthio-ADP (2-MeS-ADP), a selective agonist of P2RY12 (PMID: 14755328). In these experiments, we observed robust and reproducible stabilization of P2RY12 at both 51°C and 57°C, with no consistent stabilization of unrelated proteins across temperatures. This provides direct evidence that our workflow can distinguish specific from non-specific ligand-induced effects. We selected 2-MeS-ADP due to its structural stability and receptor higher-affinity over ADP, allowing us to extend our existing workflow while testing a receptor-specific interaction. We agree that extending this approach to clinically relevant small-molecule drugs, such as VX-809 with CFTR, would further underscore the pharmacological potential of MM-TPP, and we have now noted this as an important avenue for future studies.

      “X axis of Figure 3B: Log 2 fold difference of what? iBAQ? LFQ? Similar ambiguity regarding the Y axis of 3E. What peptide? And why the constant changes in estimating abundances?”

      We thank the reviewer for pointing out these inaccuracies in the figure annotations. As mentioned above, all analyses (except Figure 2B) are based on LFQ values. We have revised the figure legends and text to make this clear.

      In Figure 3E, “peptide intensity” refers to log2 LFQ peptide intensities derived from the BCS1L protein, as indicated in the figure caption. 

      “The authors suggest that P2RY6 and P2RY12 are stabilized by ADP, the hydrolysis product of ATP. Currently, the support for this suggestion is highly indirect. To support this claim, the authors need to directly show the effect of ADP. In reference to the alpha fold results shown in Figure 4D, the authors state that "Collectively, these data highlight the ability of MM-TPP to detect the side effects of parent compounds, an important consideration for drug development". To support this claim, it is necessary to show that Mao-B is indeed best stabilized with ADP or AMP, rather than ATP.”

      In this revision, we chose not to test ADP directly, as it is a broadly binding, relatively weak ligand that would likely stabilize many proteins without revealing clear target-specific effects. Since we had already evaluated ATP-VO₄, a similarly broad, non-specific ligand, additional testing with ADP would provide limited additional insight. Instead, we prioritized 2-methylthio-ADP, a selective agonist of P2RY12, to more effectively demonstrate the specificity of MM-TPP. With this ligand, we observed clear and reproducible stabilization of P2RY12, underscoring the ability of MM-TPP to resolve receptor–ligand interactions beyond ATP’s broad hydrotropic effects. Importantly, and as expected, we did not observe stabilization of the related purinergic receptor P2RY6, further supporting the specificity of the observed effect.

      We have also revised the AlphaFold-related statement in Figure 4D to adopt a more cautious tone: “Collectively, these data suggest that MM-TPP may detect potential side effects of parent compounds, an important consideration for drug development.” In this context, we use AlphaFold not as a validation tool, but rather as a structural aid to help rationalize why certain off-target proteins (e.g., ATP with Mao-B) exhibit stabilization.

      Reviewer #2 (Recommendations for the authors):

      “In the main text, it will be useful to include the unique peptides table of at least the targets discussed in the manuscript. For example, in presence of AMP-PNP at 51oC P2RY6 shows 4-6 peptides in all n=3 positive & negative ionization modes. But, for P2RY12 only 1-3 peptides were observed. Depending on the sequence length and the relative abundance in the cell of a protein of interest, the number of peptides observed could vary a lot per protein. Given the unique peptide abundance reported in the supplementary file, for various proteins in different conditions, it appears the threshold of observation of two unique peptides for a protein to be analyzed seems less stringent.”

      By applying a filter requiring at least two unique peptides in at least one replicate, we exclude, on average, 15–20% of the total identified proteins. We consider this a reasonable level of stringency that balances confidence in protein identification with the retention of relevant data. This threshold was selected because it aligns with established LC-MS/MS data analysis practices (PMID: 32591519, 33188197, 26524241), and we have included these references in the Methods section to justify our approach. We have included in this revision a Supplemental Table 2 showing the unique peptide counts for proteins highlighted in this study.  

      “It appears that the time of heat treatment for peptidisc library subjected to MM-TPP profiling was chosen as 3 min based on the results presented in Supplementary Figure 1A, especially the loss of MsbA observed in 1% DDM after 3 min heat perturbation. However, when reconstituted in peptidisc there seems to be no loss in MsbA even after 12 mins at 45oC. So, perhaps a longer heat treatment would be a more efficient perturbation.”

      Previous studies indicate that heat exposure of 3–5 minutes is optimal for visualizing protein denaturation (PMID: 23828940, 32133759). We have added a statement to the Results section to justify our choice of heat exposure. Although MsbA remains stable at 45 °C for extended periods, higher temperatures allow for more effective perturbation to reveal destabilization. Supplementary Figure 1A specifically illustrates MsbA instability in detergent environments.

      “Some of the stabilized temperatures listed in Table 1 are a bit confusing. For example, ABCC3 and ABCG2. In the case of ABCC3 stabilization was observed at 51oC and 60oC, but 56oC is not mentioned. In the same way, 51oC is not mentioned for ABCG2. You would expect protein to be stabilized at 56oC if it is stabilized at both 51oC and 60oC. So, it is unclear if the stabilizations were not monitored for these proteins at the missing temperatures in the table or if no peptides could be recorded at these temperatures as in the case of P2RX4 at 60oC in Figure 4C.”

      Both scenarios are represented in our data. For some proteins, like ABCG2, sufficient peptide coverage was achieved, but no stabilization was observed at intermediate temperatures (e.g., 56 °C), likely because the perturbation was not strong enough to reveal an effect. In other cases, such as ABCC3 at 56 °C or P2RX4 at 60 °C, the proteins were not detected due to insufficient peptide identifications at those temperatures, which explains their omission from the table. 

      “In Figure 4C, it is perplexing to note that despite n = 3 there were no peptide fragments detected for P2RX4 at 60oC in presence of ATP-VO4, but they were detected in presence of AMP-PNP. It will be useful to learn authors explanation for this, especially because both of these ligands destabilize P2RX4. In Figure 4B, it would have been great to see the effect of ADP too, to corroborate the theory that ATP metabolites could impact the thermal stability.”

      In Figure 4C, the absence of P2RX4 peptide detection at 60 °C with ATP–VO₄ mirrors variability observed in the corresponding control (n = 6). Specifically, neither the control nor ATP–VO₄ produced unique peptides for P2RX4 at 60 °C in that replicate, whereas peptides were detected at 60 °C in other replicates for both the control and AMPPNP, and at 64 °C for ATP–VO<sub>4</sub>, the controls, and AMP-PNP. Such missing values are a natural feature of MS-based proteomics and can arise from multiple technical factors, including inconsistent heating, incomplete digestion, stochastic MS injection, or interference from Peptidisc peptides. We therefore interpret the absence of peptides in this replicate as a technical artifact rather than evidence against protein destabilization. Importantly, the overall dataset consistently shows that both ATP–VO₄ and AMP-PNP destabilize P2RX4, supporting their characterization as broad, non-specific ligands with off-target effects.

      Because ATP and ADP belong to the same class of broadly binding, non-specific ligands, additional testing with ADP would not provide meaningful mechanistic insight. Instead, we chose to test 2-methylthio-ADP, a selective P2RY12 agonist. This experiment revealed robust, reproducible stabilization of P2RY12, without consistent effects on unrelated proteins at 51 °C and 57 °C, thereby demonstrating the ability of MM-TPP to detect specific receptor–ligand interactions.

      Finally, we note that P2RX4 is not a primary target of ATP–VO<sub>4</sub> or AMP-PNP. Consequently, the observed destabilization of P2RX4 is expected to be less pronounced than the strong, physiologically consistent stabilization of ABC transporters by ATP–VO<sub>4</sub>, as shown in Figure 3D, where the majority of ABC transporters are thermally stabilized across all tested temperatures.

      “As per Figure 4, P2Y receptors P2RY6 and P2RY12 both showed great thermal stability in presence of ATP-VO4 despite their preference for ADP. The authors argue this could be because of ATP metabolism, and binding of the resultant ADP to the P2RY6. If P2RX4 prefers ATP and not the metabolized product ADP that apparently is available, ideally you should not see a change in stability. A stark destabilization would indicate interaction of some sorts. P2X receptors are activated by ATP and are not naturally activated by AMP-PNP. So, destabilization of P2RX4 upon binding to ATP that can activate P2X receptors is conceivable. However, destabilization both in presence of ATP-VO4 and AMP-PNP is unclear. It is perhaps useful to test effect of ADP using this method, and maybe even compare some antagonists such as TNPATP.”

      In this study, we did not directly test ADP, as we had already demonstrated that MM-TPP detects stabilization by broad-binding ligands such as ATP–VO₄. Instead, we focused on a more selective ligand, 2-MeS-ADP, a specific agonist of P2RY12 [PMID: 14755328]. Here, we observed robust and reproducible stabilization of P2RY12 at 51 °C and 57 °C, while P2RY6 showed no significant changes, and no other proteins were consistently stabilized (Figure 4B, S4). This confirms that MM-TPP can distinguish specific ligand–receptor interactions from broader ATP-induced effects. To further explore the assay’s nuance and sensitivity, testing additional nucleotide ligands—including antagonists like TNP-ATP or ATPγS—would provide valuable insights, and we have identified this as an important future direction.

    1. eLife Assessment

      This valuable study reports the physiological function of a putative transmembrane UDP-N-acetylglucosamine transporter called SLC35G3 in spermatogenesis. The conclusion that SLC35G3 is a new and essential factor for male fertility in mice and probably in humans is supported by convincing data. This study will be of interest to reproductive biologists and physicians working on male infertility.

    2. Reviewer #2 (Public review):

      Summary:

      This study characterized the function of SLC35G3, a putative transmembrane UDP-N-acetylglucosamine transporter, in spermatogenesis. They showed that SLC35G3 is testis-specific and expressed in round spermatids. Slc35g3-null males were sterile but females were fertile. Slc35g3-null males produced normal sperm count but sperm showed subtle head morphology. Sperm from Slc35g3-null males have defects in uterotubal junction passage, ZP binding, and oocyte fusion. Loss of SLC35G3 causes abnormal processing and glycosylation of a number sperm proteins in testis and sperm. They demonstrated that SLC35G3 functions as a UDP-GlcNAc transporter in cell lines. Two human SLC35G3 variants impaired its transporter activity, implicating these variants in human infertility.

      Strengths:

      This study is thorough. The mutant phenotype is strong and interesting. The major conclusions are supported by the data. This study demonstrated SLC35G3 as a new and essential factor for male fertility in mice, which is likely conserved in humans.

      Weaknesses:

      Some data interpretations needed to be revised. These have been adequately addressed in the revised manuscript.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In the present manuscript, Mashiko and colleagues describe a novel phenotype associated with deficient SLC35G3, a testis-specific sugar transporter that is important in glycosylation of key proteins in sperm function. The study characterizes a knockout mouse for this gene and the multifaceted male infertility that ensues. The manuscript is well-written and describes novel physiology through a broad set of appropriate assays.

      Strengths:

      Robust analysis with detailed functional and molecular assays

      Weaknesses:

      (1) The abstract references reported mutations in human SLC35G3, but this is not discussed or correlated to the murine findings to a sufficient degree in the manuscript. The HEK293T experiments are reasonable and add value, but a more detailed discussion of the clinical phenotype of the known mutations in this gene and whether they are recapitulated in this study (or not) would be beneficial.

      Since no patients have been identified, our experiment was conducted to investigate the activity of the mutation found in humans.

      (2) Can the authors expand on how this mutation causes such a wide array of phenotypic defects? I am surprised there is a morphological defect, a fertilization defect, and a transit defect. Do the authors believe all of these are present in humans as well?

      Thank you for your comment. There are many glycoprotein-coding genes that influence sperm head morphology, fertilization defect, and transit defect have been identified in knockout mouse studies, and most of these are conserved in humans. Therefore, we believe that glycan modification by SLC35G3 is also involved in the regulation of human sperm. 

      Reviewer #2 (Public review):

      Summary:

      This study characterized the function of SLC35G3, a putative transmembrane UDP-N-acetylglucosamine transporter, in spermatogenesis. They showed that SLC35G3 is testis-specific and expressed in round spermatids. Slc35g3-null males were sterile, but females were fertile. Slc35g3-null males produced a normal sperm count, but sperm showed subtle head morphology. Sperm from Slc35g3-null males have defects in uterotubal junction passage, ZP binding, and oocyte fusion. Loss of SLC35G3 causes abnormal processing and glycosylation of a number of sperm proteins in the testis and sperm. They demonstrated that SLC35G3 functions as a UDP-GlcNAc transporter in cell lines. Two human SLC35G3 variants impaired their transporter activity, implicating these variants in human infertility.

      Strengths:

      This study is thorough. The mutant phenotype is strong and interesting. The major conclusions are supported by the data. This study demonstrated SLC35G3 as a new and essential factor for male fertility in mice, which is likely conserved in humans.

      Weaknesses:

      Some data interpretations need to be revised.

      Thank you for comments. We revised interpretations.

      Reviewer #1 (Recommendations for the authors):

      (1) The introduction could be structured more efficiently. Much of what is discussed in the first paragraph appears to be redundant to the second paragraph (or perhaps unrelated to the present manuscript).

      In the Introduction, we described the process of glycoprotein formation, 1) quality control or nascent glycoproteins in the ER and its relations importance in sperm fertilizing ability, 2) glycan maturation in the Golgi apparatus and its importance in sperm fertilizing ability, and 3) the supply of nucleotide sugars as the basis of these processes. 

      We would like to retain this structure in the revised manuscript and appreciate your understanding.

      (2) Given the significant difference in morphology between murine and human sperm, can the authors comment on whether these findings are directly translatable to humans?

      Thank you for your comment. There are significant differences in sperm morphology between mice and humans, but many glycoprotein-coding genes that influence sperm head morphology have been identified in knockout mouse studies, and most of these are conserved in humans. Therefore, we believe that glycan modification by SLC35G3 is also involved in the regulation of human sperm head morphology. Observing sperm samples from individuals with SLC35G3 mutations is the most direct approach to verify this point and is considered an important goal for future research. The following text has been added to clarify the point:

      New Line 338; While these proteins are also found in humans, it is still too early to infer the importance of SLC35G3 in the morphogenesis of human sperm heads. Observing sperm samples from individuals with SLC35G3 mutations would be the most direct approach to address this, and we consider it an important objective for future studies.

      (3) Line 194 - while the inability to pass the UTJ may indeed be a component of this infertility phenotype, I would argue that a complete lack of ability to fertilize (even with IVF but not ICSI) suggests that the primary defect is elsewhere. This statement should be removed, and the topic of these two separate mechanisms should be compared/contrasted in the discussion.

      We agree that this is an overstatement, so we changed it;

      New line 187; Thus, the defective UTJ migration is one of the primary causes of Slc35g3-/- male infertility. 

      We believe the current statement in the discussion can stay as it is. 

      Line 379; We reaffirmed that glycosylation-related genes specific to the testis play a crucial role in the synthesis, quality control, and function of glycoproteins on sperm, which are essential for male fertility through their interactions with eggs and the female reproductive system.

      (4) Did the authors consider performing TEM to assess the sperm ultrastructure and the acrosome?

      Since morphological abnormalities were evident even at the macro level, TEM was not performed in this study. In the future, we plan to use immune-TEM against affected/non-affected glycoproteins when the antibodies become available.

      (5) I would argue that Figure 3 should not be labeled as "essential", given the abnormal sperm head morphology compared to humans, the relatively modest difference between the groups on PCA, and more broadly speaking, the relatively poor correlation with morphology and human male infertility. While globozoospermia is clearly an exception, the data in this figure may not translate to human sperm and/or may not be clinically relevant even if it does.

      Indeed, other KO spermatozoa with similar morphological features are known to cause a reduction in litter size but do not result in complete infertility. As discussed in line 1, this head shape is not essential for fertilization. Reviewer 2 also pointed out that the phrase "Slc35g3 is essential for sperm head formation" is too strong; therefore, we would like to revise Fig3 title to "Slc35g3 is involved in the regulation of sperm head morphology."

      (6) Have the authors generated slc35b4 KO mice?

      No, we did not. Since Slc35b4 is expressed throughout the body, a straight knockout may affect other organs or developmental processes. To investigate its role specifically in the testis, it will be necessary to generate a conditional knockout (cKO) model. As this requires considerable cost, time, and labor, we would like to leave it for future investigation.

      Reviewer #2 (Recommendations for the authors):

      (1) Lines 122-123: "it is prominently expressed in the testis, beginning 21 days postpartum (Figure 1B), suggesting expression from the secondary spermatocyte stage to the round spermatid stage in mice." Day 21 indicates the first appearance of round spermatids, but not secondary spermatocytes. Please change to the following: ...suggesting that its expression begins in round spermatids in mice.

      I agree with your comment and have revised the text accordingly (New line 114).

      (2) Figure 1E: What germ cells are they? The type of germ cells needs to be labelled on the image. Double staining with a germ cell marker would be helpful to distinguish germ cells from testicular somatic cells.

      Thank you for your comment. We replaced the Figure 1E as follows.

      To distinguish germ cells from testicular somatic cells, we used the germ cell marker TRA98 antibody. Furthermore, based on the nuclear and GM130 staining pattern, we consider that the Golgi apparatus of round spermatids is labeled.

      (3) Figure 2C: The most abundant WB band is between 20 and 25 kD and is non-specific. Does the arrow point to the expected SLC35G3 band? There are two minor bands above the main non-specific band. Are both bands specific to SLC35G3? Given the strong non-specific band on WB, how specific is the immunofluorescence signal produced by this antibody? These need to be explained and discussed.

      The arrow pointed to the expected size (35kDa).

      We thought that these non-specific bands could be due to blood contamination, so we retried with testicular germ cells. We confirmed that non-specific bands disappeared in the subsequent Western blot analysis. The specificity of the immunofluorescence signal is supported by its complete absence in the KO, as shown in the Supplementary Figures. We have decided to include this improved dataset. Thank you for your comment, which helped us improve the data.

      Author response image 1.

      (4) Line 184: "Slc35g3-/--derived sperm have defects in ZP binding and oolemma fusion ability, but genomic integrity is intact." Producing viable offspring does not necessarily mean that genomic integrity is intact. Suggestion: Slc35g3-/--derived sperm have defects in ZP binding and oolemma fusion ability but produce viable offspring. Likewise, the Figure S9 caption also needs to be changed.

      Thank you for your constructive comment. We have revised the text as you suggested.

      (5) Figure 3. "Slc35g3 is essential for sperm head formation". This statement is too strong. It is not essential for sperm head formation. The sperm head is still formed, but shows subtle deformation.

      Thank you for your suggestion. We changed as follows:

      FIg.3; ”Slc35g3 is involved in the regulation of sperm head morphology.”

      (6) Lines 204-205: Figure 6B: "Interestingly, some bands of sperm acrosome-associated 1 (SPACA1; 26) disappeared in Slc35g3-/- testis lysates." I don't see the absence of SPACA1 bands in -/- testis. This needs to be clearly labeled with arrows. On the contrary, the bands are stronger in Slc35g3-/- testis lysates.

      Thank you for your comment. After carefully considering your comments, we concluded that using "disappeared" is indeed inappropriate. We would like to revise the sentence as follows: New line 197; "Interestingly, SPACA1 (Sperm Acrosome Associated 1; 26) exhibited a subtle difference in banding pattern in the Slc35g3-/- testis lysate."

    1. eLife Assessment

      This study reports important negative results, showing that genetically removing the RNA-binding protein PTBP1 in astrocytes is insufficient to convert them into neurons, thereby challenging previous claims in the field. It also offers a compelling analysis of PTBP1's role in regulating astrocyte-specific splicing. The evidence is strong, as the experiments are technically sound, carefully controlled, and supported by both imaging and transcriptomic analyses.

    2. Reviewer #1 (Public review):

      Summary:

      Zhang et al. used a conditional knockout mouse model to re-examine the role of the RNA-binding protein PTBP1 in the transdifferentiation of astroglial cells into neurons. Several earlier studies reported that PTBP1 knockdown can efficiently induce the transdifferentiation of rodent glial cells into neurons, suggesting potential therapeutic applications for neurodegenerative diseases. However, these findings have been contested by subsequent studies, which in turn have been challenged by more recent publications. In their current work, Zhang et al. deleted exon 2 of the Ptbp1 gene using an astrocyte-specific, tamoxifen-inducible Cre line and investigated - using fluorescence imaging and bulk and single-cell RNA-sequencing - whether this manipulation promotes the transdifferentiation of astrocytes into neurons across various brain regions. The data strongly indicate that genetic ablation of PTBP1 is not sufficient to drive efficient conversion of astrocytes into neurons. Interestingly, while PTBP1 loss alters splicing patterns in numerous genes, these changes do not shift the astroglial transcriptome toward a neuronal profile.

      Strengths:

      Although this is not the first report of PTBP1 ablation in mouse astrocytes in vivo, this study utilizes a distinct knockout strategy and provides novel insights into PTBP1-regulated splicing events in astrocytes. The manuscript is well written, and the experiments are technically sound and properly controlled. I believe this study will be of considerable interest to the broad readership of eLife.

      Original weaknesses:

      (1) The primary point that needs to be addressed is a better understanding of the effect of exon 2 deletion on PTBP1 expression. Figure 4D shows successful deletion of exon 2 in knockout astrocytes. However - assuming that the coverage plots are CPM-normalized - the overall PTBP1 mRNA expression level appears unchanged. Figure 6A further supports this observation. This is surprising, as one would expect that the loss of exon 2 would shift the open reading frame and trigger nonsense-mediated decay of the PTBP1 transcript. Given this uncertainty, the authors should confirm the successful elimination of PTBP1 protein in cKO astrocytes using an orthogonal approach, such as Western blotting, in addition to immunofluorescence. They should also discuss possible reasons why PTBP1 mRNA abundance is not detectably affected by the frameshift.

      (2) The authors should analyze PTBP1 expression in WT and cKO substantia nigra samples shown in Figure 3 or justify why this analysis is not necessary.

      (3) Lines 236-238 and Figure 4E: The authors report an enrichment of CU-rich sequences near PTBP1-regulated exons. To better compare this with previous studies on position-specific splicing regulation by PTBP1, it would be helpful to assess whether the position of such motifs differs between PTBP1-activated and PTBP1-repressed exons.

      (4) The analyses in Figure 5 and its supplement strongly suggest that the splicing changes in PTBP1-depleted astrocytes are distinct from those occurring during neuronal differentiation. However, the authors should ensure that these comparisons are not confounded by transcriptome-wide differences in gene expression levels between astrocytes and developing neurons. One way to address this concern would be to compare the new PTBP1 cKO data with publicly available RNA-seq datasets of astrocytes induced to transdifferentiate into neurons using proneural transcription factors (e.g., PMID: 38956165).

      Point 1 has been successfully addressed in the revision by providing relevant references/discussion. Points 2-4 were addressed by including additional data/analyses.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Zhang and colleagues describes a study that investigated if deletion of PTBP1 in adult astrocytes in mice led to an astrocyte-to-neuron conversion. The study revisited the hypothesis that reduced PTBP1 expression reprogrammed astrocytes to neurons. More than 10 studies have been published on this subject, with contradicting results. Half of the studies supported the hypothesis while the other half did not. The question being addressed is an important one because if the hypothesis is correct, it can lead to exciting therapeutic applications for treating neurodegenerative diseases such as Parkinson's disease.

      In this study, Zhang and colleagues conducted a conditional mouse knockout study to address the question. They used the Cre-LoxP system to specifically delete PTBP1 in adult astrocytes. Through a series of carefully controlled experiments including cell lineage tracing, the authors found no evidence for the astrocyte-to-neuron conversion.

      The authors then carried out a key experiment that none of previous studies on the subject did: investigating alternative splicing pattern changes in PTBP1-depleted cells using RNA-seq analysis. The idea is to compare the splicing pattern change caused by PTBP1 deletion in astrocytes to what occurs during neurodevelopment. This is an important experiment that will help illuminate if the astrocyte-to-neuron transition occurred in the system. The result was consistent with that of the cell staining experiments: no significant transition being detected.

      These experiments demonstrate that, in this experiment setting, PTBT1 deletion in adult astrocytes did not convert the cells to neurons.

      Strengths:

      This is a well-designed, elegantly conducted, and clearly described study that addresses an important question. The conclusions provide important information to the field.<br /> To this reviewer, this study provided convincing and solid experimental evidence to support the authors' conclusions.

      My concerns in the previous review have been addressed satisfactorily.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Zhang et al. used a conditional knockout mouse model to re-examine the role of the RNAbinding protein PTBP1 in the transdifferentiation of astroglial cells into neurons. Several earlier studies reported that PTBP1 knockdown can efficiently induce the transdifferentiation of rodent glial cells into neurons, suggesting potential therapeutic applications for neurodegenerative diseases. However, these findings have been contested by subsequent studies, which in turn have been challenged by more recent publications. In their current work, Zhang et al. deleted exon 2 of the Ptbp1 gene using an astrocyte-specific, tamoxifen-inducible Cre line and investigated, using fluorescence imaging and bulk and single-cell RNA-sequencing, whether this manipulation promotes the transdifferentiation of astrocytes into neurons across various brain regions. The data strongly indicate that genetic ablation of PTBP1 is not sufficient to drive efficient conversion of astrocytes into neurons. Interestingly, while PTBP1 loss alters splicing patterns in numerous genes, these changes do not shift the astroglial transcriptome toward a neuronal profile.

      Strengths:

      Although this is not the first report of PTBP1 ablation in mouse astrocytes in vivo, this study utilizes a distinct knockout strategy and provides novel insights into PTBP1-regulated splicing events in astrocytes. The manuscript is well written, and the experiments are technically sound and properly controlled. I believe this study will be of considerable interest to a broad readership.

      Weaknesses:

      (1) The primary point that needs to be addressed is a better understanding of the effect of exon 2 deletion on PTBP1 expression. Figure 4D shows successful deletion of exon 2 in knockout astrocytes. However, assuming that the coverage plots are CPM-normalized, the overall PTBP1 mRNA expression level appears unchanged. Figure 6A further supports this observation. This is surprising, as one would expect that the loss of exon 2 would shift the open reading frame and trigger nonsense-mediated decay of the PTBP1 transcript. Given this uncertainty, the authors should confirm the successful elimination of PTBP1 protein in cKO astrocytes using an orthogonal approach, such as Western blotting, in addition to immunofluorescence. They should also discuss possible reasons why PTBP1 mRNA abundance is not detectably affected by the frameshift.

      We thank the reviewer for raising this important point. Indeed, the deletion of exon 2 introduces a frameshift that is predicted to disrupt the PTBP1 open reading frame and trigger nonsensemediated decay (NMD). While our CPM-normalized coverage plots (Figure 4D) and gene-level expression analysis (Figure 6A) suggest that PTBP1 mRNA levels remain largely unchanged in cKO astrocytes, we acknowledge that this observation is counterintuitive and merits further clarification.

      We suspect that the process of brain tissue dissociation and FACS sorting for bulk or single cell RNA-seq may enrich for nucleic material and thus dilute the NMD signal, which occurs in the cytoplasm. Alternatively, the transcripts (like other genes) may escape NMD for unknown mechanisms. Although a frameshift is a strong indicator for triggering NMD, it does not guarantee NMD will occur in every case. (lines 346-353)

      Regarding the validation of PTBP1 protein depletion in cKO astrocytes by Western blotting, we acknowledge that orthogonal approaches to confirm PTBP1 elimination would address uncertainty around the effect of exon 2 deletion on PTBP1 expression. The low cell yield of cKO astrocytes vis FACS poses a significant burden on obtaining sufficient samples for immunoblotting detection of PTBP1 depletion. On average 3-5 adult animals per genotype (with three different alleles) are needed for each biological replicate. The manuscript contains PTBP1 immunofluorescence staining of brain slides to demonstrate PTBP1 deletion (Figures 1-2, Figure 3 supplement 1). Our characterization of this Ptbp1 deletion allele in other contexts show the loss of full length PTBP1 proteins in ESCs using Western blotting (PMID: 30496473). Furthermore, germline homozygous mutant mice do not survive beyond embryonic day 6, supporting that it is a loss of function allele.

      (2) The authors should analyze PTBP1 expression in WT and cKO substantia nigra samples shown in Figure 3 or justify why this analysis is not necessary.

      We thank the reviewer for pointing out this important question. Although we are using an astrocyte-specific PTBP1 knockout (KO) mouse model, which is designed to delete PTBP1 in all the astrocyte throughout mouse brain, and although we have systematically verified PTBP1 elimination in different mouse brain regions (cortex and striatum) at multiple time points (from 4w to 12w after tamoxifen administration), we agree that it remains necessary and important to demonstrate whether the observed lack of astrocyte-to-neuron conversion is indeed associated with sufficient PTBP1 depletion.

      We have analyzed the PTBP1 expression in the substantia nigra, as we did in the cortex and striatum. We added a new figure (Figure 3-figure supplement 1) to show the results. We found in cKO samples, tdT+ cells lack PTBP1 immunostaining, and there is no overlapping of NeuN+ and tdT+ signals. These results show effective PTBP1 depletion in the substantia nigra, similar to that observed in the cortex and striatum. (line 221-224)

      (3) Lines 236-238 and Figure 4E: The authors report an enrichment of CU-rich sequences near PTBP1-regulated exons. To better compare this with previous studies on position-specific splicing regulation by PTBP1, it would be helpful to assess whether the position of such motifs differs between PTBP1-activated and PTBP1-repressed exons.

      We thank the reviewer for this insightful comment. We agree that assessing the positional distribution of CU-rich motifs between PTBP1-activated and PTBP1-repressed exons would provide valuable insight into the position-specific regulatory mechanisms of PTBP1. In response, we have performed separate motif enrichment analyses for PTBP1-activated and PTBP1-repressed exons and examined whether their positional patterns differ (Figure 4–figure supplement 2).

      Our analysis revealed that CU-rich motifs were significantly enriched in the upstream introns of both activated and repressed exons by PTBP1 loss, with higher enrichment observed in repressed exons (Enrichment ratio = 2.14, q = 9.00×10-5) compared to activated exons (Enrichment ratio = 1.72, q = 7.75×10-5) (Figure 4–figure supplement 2B–C). In contrast, no CU-rich motifs were found downstream of activated exons (Figure 4–figure supplement 2D), while a weak, non-significant enrichment was observed downstream of repressed exons (Enrichment ratio = 1.21, q = 0.225; Figure 4–figure supplement 2E). These results do not necessarily fully fit with a couple of earlier PTBP1 CLIP studies showing differential PTBP1 binding for repressed vs activated exons but are more in line with the Black Lab study (PMID: 24499931) that PTBP1 binds upstream introns of both repressed and activated exons. Either case, PTBP1 affects a diverse set of alternative exons and likely involves diverse contextdependent binding patterns (lines 244-257).

      (4) The analyses in Figure 5 and its supplement strongly suggest that the splicing changes in PTBP1-depleted astrocytes are distinct from those occurring during neuronal differentiation. However, the authors should ensure that these comparisons are not confounded by transcriptome-wide differences in gene expression levels between astrocytes and developing neurons. One way to address this concern would be to compare the new PTBP1 cKO data with publicly available RNA-seq datasets of astrocytes induced to transdifferentiate into neurons using proneural transcription factors (e.g., PMID: 38956165).

      We would like to express our gratitude for the thoughtful feedback. We agree that transcriptome-wide differences in gene expression between astrocytes and developing neurons could confound the interpretation of splicing differences. To address this concern, we have incorporated publicly available RNA-seq datasets from studies in which astrocytes are reprogrammed into neurons using proneural transcription factors, Ngn2 or PmutNgn2 (PMID: 38956165).

      The results of principal component analysis (PCA) for splicing profiles revealed that the in vivo splicing profiles from this study and the in vitro splicing profiles from PMID 38956165 are well separated on PC1 and PC2. While Ngn2/PmutNgn2-induced neurons and control astrocytes started to show distinction on PC3 (and to some degree on PC4), Ptbp1 cKO samples remained tightly grouped with control astrocytes and showed no directional shift toward the neuronal cluster (Figure 5–figure supplement 2B). These findings further support the conclusion that PTBP1 depletion in mature astrocytes does not induce a neuronal-like splicing program, even when compared against neurons derived from the astrocyte lineage (lines 306318).

      The pairwise correlation analysis of percent spliced in between Ptbp1 cKO, control astrocytes, and induced neurons confirmed that Ptbp1 cKO astrocytes are highly similar to control astrocytes (ρ = 0.81) and clearly distinct from induced neurons (ρ = 0.62) (Figure 5– figure supplement 2C), reinforcing the notion that PTBP1 loss alone is insufficient to drive a neuronal-like splicing transition (lines 319-336).

      Consistent with the analysis for splicing profiles, PCA for gene expression profiles showed that control and Ptbp1 cKO astrocytes clustered tightly together and no directional shift toward the neuronal cluster while Ngn2/PmutNgn2-induced neurons and control astrocytes were distributed across a broader range (Figure 6–figure supplement 1A–B). Correlation analysis further supported this result, with a strong similarity between Ptbp1 cKO and control astrocytes (ρ = 0.97), and low similarity between Ptbp1 cKO astrocytes and induced neurons (ρ = 0.27) (Figure 6–figure supplement 1C). These findings indicate that, even with PTBP1 loss, cKO astrocytes retain a transcriptional profile very distinct from that of neurons, underscoring that Ptbp1 deficiency alone does not induce astrocyte-to-neuron reprogramming at the transcriptomic level (lines 366-373).

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Zhang and colleagues describes a study that investigated whether the deletion of PTBP1 in adult astrocytes in mice led to an astrocyte-to-neuron conversion. The study revisited the hypothesis that reduced PTBP1 expression reprogrammed astrocytes to neurons. More than 10 studies have been published on this subject, with contradicting results. Half of the studies supported the hypothesis while the other half did not. The question being addressed is an important one because if the hypothesis is correct, it can lead to exciting therapeutic applications for treating neurodegenerative diseases such as Parkinson's disease.

      In this study, Zhang and colleagues conducted a conditional mouse knockout study to address the question. They used the Cre-LoxP system to specifically delete PTBP1 in adult astrocytes. Through a series of carefully controlled experiments, including cell lineage tracing, the authors found no evidence for the astrocyte-to-neuron conversion.

      The authors then carried out a key experiment that none of the previous studies on the subject did: investigating alternative splicing pattern changes in PTBP1-depleted cells using RNA-seq analysis. The idea is to compare the splicing pattern change caused by PTBP1 deletion in astrocytes to what occurs during neurodevelopment. This is an important experiment that will help illuminate whether the astrocyte-to-neuron transition occurred in the system. The result was consistent with that of the cell staining experiments: no significant transition was detected.

      These experiments demonstrate that, in this experimental setting, PTBT1 deletion in adult astrocytes did not convert the cells to neurons.

      Strengths:

      This is a well-designed, elegantly conducted, and clearly described study that addresses an important question. The conclusions provide important information to the field.

      To this reviewer, this study provided convincing and solid experimental evidence to support the authors' conclusions.

      Weaknesses:

      The Discussion in this manuscript is short and can be expanded. Can the authors speculate what led to the contradictory results in the published studies? The current study, in combination with the study published in Cell in 2021 by Wang and colleagues, suggests that observed difference is not caused by the difference of knockdown vs. knockout. Is it possible that other glial cell types are responsible for the transition? If so, what cells? Oligodendrocytes?

      We are grateful for the reviewer’s careful reading and valuable suggestions. We have expanded the Discussion to include discussion of possible origins of glial cells responsible for neuronal transition. (lines 441-461)

      Reviewer #1 (Recommendations for the authors):

      (1) Throughout the text and figures, it is customary to write loxP with a capital "P".

      We have capitalized “P” in loxP throughout the text and figures.

      (2) It would be helpful to indicate the brain regions analyzed above the images in Figure 1B-C, Figure 2A-B, Figure 1 - Supplement 3, and Figure 2 - Supplement 2, as was done in Figure 1 - Supplement 1.

      The labels indicating brain regions of corresponding images have been added to the figures. 

      (3) The arrowheads in Figure 1C, Figure 2B, Figure 3, and several supplemental panels are nearly equilateral triangles, making their direction difficult to discern. Consider using a more slender or indented design (e.g., ➤).

      We have replaced triangular arrowheads with indented arrowheads in the figures. 

      (4) Lines 181-209: This section should be revised, given that the striatum is not a midbrain structure.

      We have revised this section to reflect our analysis of the striatum as a brain region of the nigrostriatal pathway rather than a midbrain structure. 

      Reviewer #2 (Recommendations for the authors):

      In Supplemental Figure 1, the two open triangles are almost indistinguishable. It would be better if the colors of these open triangles were changed so that it is easier to tell what's what. There is not enough contrast between white and yellow.

      We have changed the open triangle arrowheads to solid yellow and violet arrowheads to improve contrast between labels.

    1. eLife Assessment

      This computational study examines how neurons in the songbird premotor nucleus HVC might generate the precise, sparse burst sequences that drive adult song. The findings would be useful for understanding how intrinsic conductances and HVC microcircuitry may produce neural sequences, but the work is incomplete because of arbitrary network assumptions, insufficient consideration of biological details such as how silent gaps in song sequences are represented, and failure to incorporate interactions with auditory and brainstem inputs. As a result, the study offers limited advance and only a modest conceptual advance over prior models.

    2. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors use numerical simulations to try to understand better a major experimental discovery in songbird neuroscience from 2002 by Richard Hahnloser and collaborators. The 2002 paper found that a certain class of projection neurons in the premotor nucleus HVC of adult male zebra finch songbirds, the neurons that project to another premotor nucleus RA, fired sparsely (once per song motif) and precisely (to about 1 ms accuracy) during singing.

      The experimental discovery is important to understand since it initially suggested that the sparsely firing RA-projecting neurons acted as a simple clock that was localized to HVC and that controlled all details of the temporal hierarchy of singing: notes, syllables, gaps, and motifs. Later experiments suggested that the initial interpretation might be incomplete: that the temporal structure of adult male zebra finch songs instead emerged in a more complicated and distributed way, still not well understood, from the interaction of HVC with multiple other nuclei, including auditory and brainstem areas. So at least two major questions remain unanswered more than two decades after the 2002 experiment: What is the neurobiological mechanism that produces the sparse precise bursting: is it a local circuit in HVC or is it some combination of external input to HVC and local circuitry? And how is the sparse precise bursting in HVC related to a songbird's vocalizations?

      The authors only investigate part of the first question, whether the mechanism for sparse precise bursts is local to HVC. They do so indirectly, by using conductance-based Hodgkin-Huxley-like equations to simulate the spiking dynamics of a simplified network that includes three known major classes of HVC neurons and such that all neurons within a class are assumed to be identical. A strength of the calculations is that the authors include known biophysically deduced details of the different conductances of the three majors classes of HVC neurons, and they take into account what is known, based on sparse paired recordings in slices, about how the three classes connect to one another. One weakness of the paper is that the authors make arbitrary and not-well-motivated assumptions about the network geometry, and they do not use the flexibility of their simulations to study how their results depend on their network assumptions. A second weakness is that they ignore many known experimental details such as projections into HVC from other nuclei, dendritic computations (the somas and dendrites are treated by the authors as point-like isopotential objects), the role of neuromodulators, and known heterogeneity of the interneurons. These weaknesses make it difficult for readers to know the relevance of the simulations for experiments and for advancing theoretical understanding.

      Strengths:

      The authors use conductance-based Hodgkin-Huxley-like equations to simulate spiking activity in a network of neurons intended to model more accurately songbird nucleus HVC of adult male zebra finches. Spiking models are much closer to experiments than models based on firing rates or on 2-state neurons.

      The authors include information deduced from modeling experimental current-clamp data such as the types and properties of conductances. They also take into account how neurons in one class connect to neurons in other classes via excitatory or inhibitory synapses, based on sparse paired recordings in slices by other researchers.

      The authors obtain some new results of modest interest such as how changes in the maximum conductances of four key channels (e.g., A-type K+ currents or Ca-dependent K+ currents) influence the structure and propagation of bursts, while simultaneously being able to mimic accurately current-clamp voltage measurements.

      Weaknesses:

      One weakness of this paper is the lack of a clearly stated, interesting, and relevant scientific question to try to answer. The authors do not discuss adequately in their introduction what questions have recent experimental and theoretical work failed to explain adequately concerning HVC neural dynamics and its role in producing vocalizations. The authors do not discuss adequately why they chose the approach of their paper and how their results address some of these questions.

      For example, the authors need to explain in more detail how their calculations relate to the works of Daou et al, J. Neurophys. 2013 (which already fitted spiking models to neuronal data and identified certain conductances), to Jin et al J. Comput. Neurosci. 2007 (which already discussed how to get bursts using some experimental details), and to the rather similar paper by E. Armstrong and H. Abarbanel, J. Neurophys 2016, which already postulated and studied sequences of microcircuits in HVC. This last paper is not even cited by the authors.

      The authors' main achievement is to show that simulations of a certain simplified and idealized network of spiking neurons, that includes some experimental details but ignores many others, can match some experimental results like current-clamp-derived voltage time series for the three classes of HVC neurons (although this was already reported in earlier work by Daou and collaborators in 2013), and simultaneously the robust propagation of bursts with properties similar to those observed in experiments. The authors also present results about how certain neuronal details and burst propagation change when certain key maximum conductances are varied.

      But these are weak conclusions for two reasons. First, the authors did not do enough calculations to allow the reader to understand how many parameters were needed to obtain these fits and whether simpler circuits, say with fewer parameters and simpler network topology, could do just as well. Second, many previous researchers have demonstrated robust burst propagation in a variety of feed-forward models. So what is new and important about the authors' results compared to the previous computational papers?

      Also missing is a discussion, or at least an acknowledgement, of the fact that not all of the fine experimental details of undershoots, latencies, spike structure, spike accommodation, etc may be relevant for understanding vocalization. While it is nice to know that some model can match these experimental details and produce realistic bursts, that does not mean that all of these details are relevant for the function of producing precise vocalizations. Scientific insights in biology often require exploring which of the many observed details can be ignored, and especially identifying the few that are essential for answering some questions. As one example, if HVC-X neurons are completely removed from the authors' model, does one still get robust and reasonable burst propagation of HVC-RA neurons? While part of nucleus HVC acts as a premotor circuit that drives nucleus RA, part of HVC is also related to learning. It is not clear that HVC-X neurons, which carry out some unknown calculation and transmit information to area X in a learning pathway, are relevant for burst production and propagation of HVC-RA neurons, and so relevant for vocalization. Simulations provide a convenient and direct way to explore questions of this kind.

      One key question to answer is whether the bursting of HVC-RA projection neurons is based on a mechanism local to HVC or is some combination of external driving (say from auditory nuclei) and local circuitry. The authors do not contribute to answering this question because they ignore external driving and assume that the mechanism is some kind of intrinsic feed-forward circuit, which they put in by hand in a rather arbitrary and poorly justified way, by assuming the existence of small microcircuits consisting of a few HVC-RA, HVC-X, and HVC-I neurons that somehow correspond to "sub-syllabic segments". To my knowledge, experiments do not suggest the existence of such microcircuits nor does theory suggest the need for such microcircuits.

      Another weakness of this paper is an unsatisfactory discussion of how the model was obtained, validated, and simulated. The authors should state as clearly as possible, in one location such as an appendix, what is the total number of independent parameters for the entire network and how parameter values were deduced from data or assigned by hand. With enough parameters and variables, many details can be fit arbitrarily accurately so researchers have to be careful to avoid overfitting. If parameter values were obtained by fitting to data, the authors should state clearly what was the fitting algorithm (some iterative nonlinear method, whose results can depend on the initial choice of parameters), what was the error function used for fitting (sum of least squares?), and what data were used for the fitting.

      The authors should also state clearly what is the dynamical state of the network, the vector of quantities that evolve over time. (What is the dimension of that vector, which is also the number of ordinary differential equations that have to be integrated?) The authors do not mention what initial state was used to start the numerical integrations, whether transient dynamics were observed and what were their properties, or how the results depend on the choice of initial state. The authors do not discuss how they determined that their model was programmed correctly (it is difficult to avoid typing errors when writing several pages or more of a code in any language) or how they determined the accuracy of the numerical integration method beyond fitting to experimental data, say by varying the time step size over some range or by comparing two different integration algorithms.

      Also disappointing is that the authors do not make any predictions to test, except rather weak ones such as that varying a maximum conductance sufficiently (which might be possible by using dynamic clamps) might cause burst propagation to stop or change its properties. Based on their results, the authors do not make suggestions for further experiments or calculations, but they should.

      Comments on revised version:

      The second version, unfortunately, did not address most of the substantive comments so that, while some parts of the discussion were expanded, most of the serious scientific weaknesses mentioned in the first round of review remain. The revised preprint is not a substantive improvement over the first.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The paper presents a model for sequence generation in the zebra finch HVC, which adheres to cellular properties measured experimentally. However, the model is fine-tuned and exhibits limited robustness to noise inherent in the inhibitory interneurons within the HVC, as well as to fluctuations in connectivity between neurons. Although the proposed microcircuits are introduced as units for sub-syllabic segments (SSS), the backbone of the network remains a feedforward chain of HVC_RA neurons, similar to previous models.

      Strengths:

      The model incorporates all three of the major types of HVC neurons. The ion channels used and their kinetics are based on experimental measurements. The connection patterns of the neurons are also constrained by the experiments.

      Weaknesses:

      The model is described as consisting of micro-circuits corresponding to SSS. This presentation gives the impression that the model's structure is distinct from previous models, which connected HVC_RA neurons in feedforward chain networks (Jin et al 2007, Li & Greenside, 2006; Long et al 2010; Egger et al 2020). However, the authors implement single HVC_RA neurons into chain networks within each micro-circuit and then connect the end of the chain to the start of the chain in the subsequent micro-circuit. Thus, the HVC_RA neuron in their model forms a single-neuron chain. This structure is essentially a simplified version of earlier models.

      In the model of the paper, the chain network drives the HVC_I and HVC_X neurons. The role of the micro-circuits is more significant in organizing the connections: specifically, from HVC_RA neurons to HVC_I neurons, and from HVC_I neurons to both HVC_X and HVC_RA neurons.

      We thank Reviewer 1 for their thoughtful comments.

      While the reviewer is correct about the fact that the propagation of sequential activity in this model is primarily carried by HVC<sub>RA</sub> neurons in a feed-forward manner, we need to emphasize that this is true only if there is no intrinsic or synaptic perturbation to the HVC network. For example, we showed in Figures 10 and 12 how altering the intrinsic properties of HVC<sub>X</sub> neurons or for interneurons disrupts sequence propagation. In other words, while HVC<sub>RA</sub> neurons are the key forces to carry the chain forward, the interplay between excitation and inhibition in our network as well as the intrinsic parameters for all classes of HVC neurons are equally important forces in carrying the chain of activity forward. Thus, the stability of activity propagation necessary for song production depend on a finely balanced network of HVC neurons, with all classes contributing to the overall dynamics. Moreover, all existing models that describe premotor sequence generation in the HVC either assume a distributed model (Elmaleh et al., 2021) that dictates that local HVC circuitry is not sufficient to advance the sequence but rather depends upon moment to-moment feedback through Uva (Hamaguchi et al., 2016), or assume models that rely on intrinsic connections within HVC to propagate sequential activity. In the latter case, some models assume that HVC is composed of multiple discrete subnetworks that encode individual song elements (Glaze & Troyer, 2013; Long & Fee, 2008; Wang et al., 2008), but lacks the local connectivity to link the subnetworks, while other models assume that HVC may have sufficient information in its intrinsic connections to form a single continuous network sequence (Long et al. 2010). The HVC model we present extends the concept of a feedforward network by incorporating additional neuronal classes that influence the propagation of activity (interneurons and HVC<sub>X</sub> neurons). We have shown that any disturbance of the intrinsic or synaptic conductances of these latter neurons will disrupt activity in the circuit even when HVC<sub>RA</sub> neurons properties are maintained. 

      In regard to the similarities between our model and earlier models, several aspects of our model distinguish it from prior work. In short, while several models of how sequence is generated within HVC have been proposed (Cannon et al., 2015; Drew & Abbott, 2003; Egger et al., 2020; Elmaleh et al., 2021; Galvis et al., 2018; Gibb et al., 2009a, 2009b; Hamaguchi et al., 2016; Jin, 2009; Long & Fee, 2008; Markowitz et al., 2015), all the models proposed either rely on intrinsic HVC circuitry to propagate sequential activity, rely on extrinsic feedback to advance the sequence or rely on both. These models do not capture the complex details of spike morphology, do not include the right ionic currents, do not incorporate all classes of HVC neurons, or do not generate realistic firing patterns as seen in vivo. Our model is the first biophysically realistic model that incorporates all classes of HVC neurons and their intrinsic properties. We tuned the intrinsic and the synaptic properties bases on the traces collected by Daou et al. (2013) and Mooney and Prather (2005) as shown in Figure 3. The three classes of model neurons incorporated to our network as well as the synaptic currents that connect them are based on Hodgkin- Huxley formalisms that contain ion channels and synaptic currents which had been pharmacologically identified. This is an advancement over prior models that primarily focused on the role of synaptic interactions or external inputs. The model is based on feedforward chain of microcircuits that encode for the different sub-syllabic segments and that interact with each other through structured feedback inhibition, defining an ordered sequence of cell firing. Moreover, while several models highlight the critical role of inhibitory interneurons in shaping the timing and propagation of bursts of activity in HVC<sub>RA</sub> neurons, our work offers an intricate and comprehensive model that help understand this critical role played by inhibition in shaping song dynamics and ensuring sequence propagation.

      How useful is this concept of micro-circuits? HVC neurons fire continuously even during the silent gaps. There are no SSS during these silent gaps.

      Regarding the concern about the usefulness of the 'microcircuit' concept in our study, we appreciate the comment and we are glad to clarify its relevance in our network. While we acknowledge that HVC<sub>RA</sub> neurons interconnect microcircuits, our model's dynamics are still best described within the framework of microcircuitry particularly due to the firing behavior of HVC<sub>X</sub> neurons and interneurons. Here, we are referring to microcircuits in a more functional sense, rather than rigid, isolated spatial divisions (Cannon et al. 2015), and we now make this clear on page 21. A microcircuit in our model reflects the local rules that govern the interaction between all HVC neuron classes within the broader network, and that are essential for proper activity propagation. For example, HVC<sub>INT</sub> neurons belonging to any microcircuit burst densely and at times other than the moments when the corresponding encoded SSS is being “sung”. What makes a particular interneuron belong to this microcircuit or the other is merely the fact that it cannot inhibit HVC<sub>RA</sub> neurons that are housed in the microcircuit it belongs to. In particular, if HVC<sub>INT</sub> inhibits HVC<sub>RA</sub> in the same microcircuit, some of the HVC<sub>RA</sub> bursts in the microcircuit might be silenced by the dense and strong HVC<sub>INT</sub> inhibition breaking the chain of activity again. Similarly, HVC<sub>X</sub> neurons were selected to be housed within microcircuits due to the following reason: if an HVC<sub>X</sub> neuron belonging to microcircuit i sends excitatory input to an HVC<sub>INT</sub> neuron in microcircuit j, and that interneuron happens to select an HVC<sub>RA</sub> neuron from microcircuit i, then the propagation of sequential activity will halt, and we’ll be in a scenario similar to what was described earlier for HVC<sub>INT</sub> neurons inhibiting HVC<sub>RA</sub> neurons in the same microcircuit.

      We agree that there are no sub-syllabic segments described during the silent gaps and we thank the reviewer to pointing this out. Although silent gaps are integral to the overall process of song production, we have not elaborated on them in this model due to the lack of a clear, biophysically grounded representation for the gaps themselves at the level of HVC. Our primary focus has been on modeling the active, syllable-producing phases of the song, where the HVC network’s sequential dynamics are critical for song. However, one can think the encoding of silent gaps via similar mechanisms that encode SSSs, where each gap is encoded by similar microcircuits comprised of the three classes of HVC neurons (let’s call them GAP rather than SSS) that are active only during the silent gaps. In this case, the propagation of sequential activity is carried throughout the GAPs from the last SSS of the previous syllable to the first SSS of the subsequent syllable. This is no described more clearly on page 22 of the manuscript.

      A significant issue of the current model is that the HVC_RA to HVC_RA connections require fine-tuning, with the network functioning only within a narrow range of g_AMPA (Figure 2B). Similarly, the connections from HVC_I neurons to HVC_RA neurons also require fine-tuning. This sensitivity arises because the somatic properties of HVC_RA neurons are insufficient to produce the stereotypical bursts of spikes observed in recordings from singing birds, as demonstrated in previous studies (Jin et al 2007; Long et al 2010). In these previous works, to address this limitation, a dendritic spike mechanism was introduced to generate an intrinsic bursting capability, which is absent in the somatic compartment of HVC_RA neurons. This dendritic mechanism significantly enhances the robustness of the chain network, eliminating the need to fine-tune any synaptic conductances, including those from HVC_I neurons (Long et al 2010). Why is it important that the model should NOT be sensitive to the connection strengths?

      We thank the reviewer for the comment. While mathematical models designed for highly complex nonlinear biological processes tangentially touch the biological realism, the current network as is right now is the first biologically realistic-enough network model designed for HVC that explains sequence propagation. We do not include dendritic processes in our network although that increases the realistic dynamics for various reasons. 1) The ion channels we integrated into the somatic compartment are known pharmacologically (Daou et al. 2013), but we don’t know about the dendritic compartment’s intrinsic properties of HVC neurons and the cocktail of ion channels that are expressed there. 2) We are able to generate realistic bursting in HVC<sub>RA</sub> neurons despite the single compartment, and the main emphasis in this network is on the interactions between excitation and inhibition, the effects of ion channels in modulating sequence propagation, etc … 3) The network model already incorporates thousands of ODEs that govern the dynamics of each of the HVC neurons, so we did not want to add more complexity to the network especially that we don’t know the biophysical properties of the dendritic compartments.

      Therefore, our present focus is on somatic dynamics and the interaction between HVC<sub>RA</sub> and HVC<sub>INT</sub> neurons, but we acknowledge the importance of these processes in enhancing network resiliency. Although we agree that adding dendritic processes improves robustness, we still think that somatic processes alone can offer insightful information on the sequential dynamics of the HVC network. While the network should be robust across a wide range of parameters, it is also essential that certain parameters are designed to filter out weaker signals, ensuring that only reliable, precise patterns of activity propagate. Hence, we specifically chose to make the HVC<sub>RA</sub>-to-HVC<sub>RA</sub> excitatory connections more sensitive (narrow range of values) such that only strong, precise and meaningful stimuli can propagate through the network representing the high stereotypy and precision seen in song production.

      First, the firing of HVC_I neurons is highly noisy and unreliable. HVC_I neurons fire spontaneous, random spikes under baseline conditions. During singing, their spike timing is imprecise and can vary significantly from trial to trial, with spikes appearing or disappearing across different trials. As a result, their inputs to HVC_RA neurons are inherently noisy. If the model relies on precisely tuned inputs from HVC_I neurons, the natural fluctuations in HVC_I firing would render the model non-functional. The authors should incorporate noisy HVC_I neurons into their model to evaluate whether this noise would render the model non-functional.

      We acknowledge that under baseline and singing settings, interneurons fire in an extremely noisy and inaccurate manner, although they exhibit time locked episodes in their activity (Hahnloser et al 2002, Kozhinikov and Fee 2007). In order to mimic the biological variability of these neurons, our model does, in fact, include a stochastic current to reflect the intrinsic noise and random variations in interneuron firing shown in vivo (and we highlight this in the Methods). However, to make sure the network is resilient to this randomness in interneuron firing, introduced a stochastic input current of the form I<sub>noise</sub> (t)= σ.ξ(t) where ξ(t) is a Gaussian white noise with zero mean and unit variance, and σ is the noise amplitude. This stochastic drive was introduced to every model neuron and it mimics the fluctuations in synaptic input arising from random presynaptic activity and background noise. For values of σ within 1-5% of the mean synaptic conductance, the stochastic current has no effect on network propagation. For larger values of σ, the desired network activity was disrupted or halted. We now talk about this on page 22 of the manuscript.  

      Second, Kosche et al. (2015) demonstrated that reducing inhibition by suppressing HVC_I neuron activity makes HVC_RA firing less sparse but does not compromise the temporal precision of the bursts. In this experiment, the local application of gabazine should have severely disrupted HVC_I activity. However, it did not affect the timing precision of HVC_RA neuron firing, emphasizing the robustness of the HVC timing circuit. This robustness is inconsistent with the predictions of the current model, which depends on finely tuned inputs and should, therefore, be vulnerable to such disruptions.

      We thank the reviewer for the comment. The differences between the Kosche et al. (2015) findings and the predictions of our model arise from differences in the aspect of HVC function we are modeling. Our model is more sensitive to inhibition, which is a designed mechanism for achieving precise song patterning. This is a modeling simplification we adopted to capture specific characteristics of HVC function. Hence, Kosche et al. (2015) findings do not invalidate the approach of our model, but highlights that HVC likely operates with several, redundant mechanisms that overall ensure temporal precision. 

      Third, the reliance on fine-tuning of HVC_RA connections becomes problematic if the model is scaled up to include groups of HVC_RA neurons forming a chain network, rather than the single HVC_RA neurons used in the current work. With groups of HVC_RA neurons, the summation of presynaptic inputs to each HVC_RA neuron would need to be precisely maintained for the model to function. However, experimental evidence shows that the HVC circuit remains functional despite perturbations, such as a few degrees of cooling, micro-lesions, or turnover of HVC_RA neurons. Such robustness cannot be accounted for by a model that depends on finely tuned connections, as seen in the current implementation.

      Our model of individual HVC<sub>RA</sub> neurons and as stated previously is reductive model that focuses on understanding the mechanisms that govern sequential neural activity. We agree that scaling the model to include many of HVC<sub>RA</sub> neurons poses challenges, specifically concerning the summation of presynaptic inputs. However, our model can still be adapted to a larger network without requiring the level of fine-tuning currently needed. In fact, the current fine-tuning of synaptic connections in the model is a reflection of fundamental network mechanisms rather than a limitation when scaling to a larger network. Besides, one important feature of this neural network is redundancy. Even if some neurons or synaptic connections are impaired, other neurons or pathways can compensate for these changes, allowing the activity propagation to remain intact.

      The authors examined how altering the channel properties of neurons affects the activity in their model. While this approach is valid, many of the observed effects may stem from the delicate balancing required in their model for proper function. In the current model, HVC_X neurons burst as a result of rebound activity driven by the I_H current. Rebound bursts mediated by the I_H current typically require a highly hyperpolarized membrane potential. However, this mechanism would fail if the reversal potential of inhibition is higher than the required level of hyperpolarization. Furthermore, Mooney (2000) demonstrated that depolarizing the membrane potential of HVC_X neurons did not prevent bursts of these neurons during forward playback of the bird's own song, suggesting that these bursts (at least under anesthesia, which may be a different state altogether) are not necessarily caused by rebound activity. This discrepancy should be addressed or considered in the model.

      In our HVC network model, one goal with HVC<sub>X</sub> neurons is to generate bursts in their underlying neuron population. Since HVC<sub>X</sub> neurons in our model receive only inhibitory inputs from interneurons, we rely on inhibition followed by rebound bursts orchestrated by the I<sub>H</sub> and the I<sub>CaT</sub> currents to achieve this goal. The interplay between the T-type Ca<sup>++</sup> current and the H current in our model is fundamental to generate their corresponding bursts, as they are sufficient for producing the desired behavior in the network. Due to this interplay, we do not need significant inhibition to generate rebound bursts, because the T-type Ca<sub>++</sub> current’s conductance can be stronger leading to robust rebound bursting even when the degree of inhibition is not very strong. This is now highlighted on page 42 in the revised version.

      Some figures contain direct copies of figures from published papers. It is perhaps a better practice to replace them with schematics if possible.

      We wanted on purpose to keep the results shown in Mooney and Prather (2005) to be shown as is, in order to compare them with our model simulations highlighting the degree of resemblance. We believe that creating schematics of the Mooney and Prather (2005) results will not have the same impact, similarly creating a schematic for Hahnloser et al (2002) results won’t help much. However, if the reviewer still believes that we should do that, we’re happy to do it.

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors use numerical simulations to try to understand better a major experimental discovery in songbird neuroscience from 2002 by Richard Hahnloser and collaborators. The 2002 paper found that a certain class of projection neurons in the premotor nucleus HVC of adult male zebra finch songbirds, the neurons that project to another premotor nucleus RA, fired sparsely (once per song motif) and precisely (to about 1 ms accuracy) during singing.

      The experimental discovery is important to understand since it initially suggested that the sparsely firing RA-projecting neurons acted as a simple clock that was localized to HVC and that controlled all details of the temporal hierarchy of singing: notes, syllables, gaps, and motifs. Later experiments suggested that the initial interpretation might be incomplete: that the temporal structure of adult male zebra finch songs instead emerged in a more complicated and distributed way, still not well understood, from the interaction of HVC with multiple other nuclei, including auditory and brainstem areas. So at least two major questions remain unanswered more than two decades after the 2002 experiment: What is the neurobiological mechanism that produces the sparse precise bursting: is it a local circuit in HVC or is it some combination of external input to HVC and local circuitry? And how is the sparse precise bursting in HVC related to a songbird's vocalizations? The authors only investigate part of the first question, whether the mechanism for sparse precise bursts is local to HVC. They do so indirectly, by using conductance-based Hodgkin-Huxley-like equations to simulate the spiking dynamics of a simplified network that includes three known major classes of HVC neurons and such that all neurons within a class are assumed to be identical. A strength of the calculations is that the authors include known biophysically deduced details of the different conductances of the three major classes of HVC neurons, and they take into account what is known, based on sparse paired recordings in slices, about how the three classes connect to one another. One weakness of the paper is that the authors make arbitrary and not well-motivated assumptions about the network geometry, and they do not use the flexibility of their simulations to study how their results depend on their network assumptions. A second weakness is that they ignore many known experimental details such as projections into HVC from other nuclei, dendritic computations (the somas and dendrites are treated by the authors as point-like isopotential objects), the role of neuromodulators, and known heterogeneity of the interneurons. These weaknesses make it difficult for readers to know the relevance of the simulations for experiments and for advancing theoretical understanding.

      Strengths:

      The authors use conductance-based Hodgkin-Huxley-like equations to simulate spiking activity in a network of neurons intended to model more accurately songbird nucleus HVC of adult male zebra finches. Spiking models are much closer to experiments than models based on firing rates or on 2-state neurons.

      The authors include information deduced from modeling experimental current-clamp data such as the types and properties of conductances. They also take into account how neurons in one class connect to neurons in other classes via excitatory or inhibitory synapses, based on sparse paired recordings in slices by other researchers. The authors obtain some new results of modest interest such as how changes in the maximum conductances of four key channels (e.g., A-type K+ currents or Ca-dependent K+ currents) influence the structure and propagation of bursts, while simultaneously being able to mimic accurately current-clamp voltage measurements.

      Weaknesses:

      One weakness of this paper is the lack of a clearly stated, interesting, and relevant scientific question to try to answer. In the introduction, the authors do not discuss adequately which questions recent experimental and theoretical work have failed to explain adequately, concerning HVC neural dynamics and its role in producing vocalizations. The authors do not discuss adequately why they chose the approach of their paper and how their results address some of these questions.

      For example, the authors need to explain in more detail how their calculations relate to the works of Daou et al, J. Neurophys. 2013 (which already fitted spiking models to neuronal data and identified certain conductances), to Jin et al J. Comput. Neurosci. 2007 (which already discussed how to get bursts using some experimental details), and to the rather similar paper by E. Armstrong and H. Abarbanel, J. Neurophys 2016, which already postulated and studied sequences of microcircuits in HVC. This last paper is not even cited by the authors.

      We thank the reviewer for this valuable comment, and we agree that we did not clarify enough throughout the paper the utility of our model or how it advanced our understanding of the HVC dynamics and circuitry. To that end, we revised several places of the manuscript and made sure to cite and highlight the relevance and relatedness of the mentioned papers.

      In short, and as mentioned to Reviewer 1, while several models of how sequence is generated within HVC have been proposed (Cannon et al., 2015; Drew & Abbott, 2003; Egger et al., 2020; Elmaleh et al., 2021; Galvis et al., 2018; Gibb et al., 2009a, 2009b; Hamaguchi et al., 2016; Jin, 2009; Long & Fee, 2008; Markowitz et al., 2015; Jin et al., 2007), all the models proposed either rely on intrinsic HVC circuitry to propagate sequential activity, rely on extrinsic feedback to advance the sequence or rely on both. These models do not capture the complex details of spike morphology, do not include the right ionic currents, do not incorporate all classes of HVC neurons, or do not generate realistic firing patterns as seen in vivo. Our model is the first biophysically realistic model that incorporates all classes of HVC neurons and their intrinsic properties. 

      No existing hypothesis had been challenged with our model, rather; our model is a distillation of the various models that’s been proposed for the HVC network. We go over this in detail in the Discussion. We believe that the network model we developed provide a step forward in describing the biophysics of HVC circuitry, and may throw a new light on certain dynamics in the mammalian brain, particularly the motor cortex and the hippocampus regions where precisely-timed sequential activity is crucial. We suggest that temporally-precise sequential activity may be a manifestation of neural networks comprised of chain of microcircuits, each containing pools of excitatory and inhibitory neurons, with local interplay among neurons of the same microcircuit and global interplays across the various microcircuits, and with structured inhibition as well as intrinsic properties synchronizing the neuronal pools and stabilizing timing within a firing sequence.

      The authors' main achievement is to show that simulations of a certain simplified and idealized network of spiking neurons, which includes some experimental details but ignores many others, match some experimental results like current-clamp-derived voltage time series for the three classes of HVC neurons (although this was already reported in earlier work by Daou and collaborators in 2013), and simultaneously the robust propagation of bursts with properties similar to those observed in experiments. The authors also present results about how certain neuronal details and burst propagation change when certain key maximum conductances are varied. However, these are weak conclusions for two reasons. First, the authors did not do enough calculations to allow the reader to understand how many parameters were needed to obtain these fits and whether simpler circuits, say with fewer parameters and simpler network topology, could do just as well. Second, many previous researchers have demonstrated robust burst propagation in a variety of feed-forward models. So what is new and important about the authors' results compared to the previous computational papers?

      A major novelty of our work is the incorporation of experimental data with detailed network models. While earlier works have established robust burst propagation, our model uses realistic ion channel kinetics and feedback inhibition not only to reproduce experimental neural activity patterns but also to suggest prospective mechanisms for song sequence production in the most biophysical way possible. This aspect that distinguishes our work from other feed-forward models. We go over this in detail in the Discussion. However, the reviewer is right regarding the details of the calculations conducted for the fits, we will make sure to highlight this in the Methods and throughout the manuscript with more details.

      We believe that the network model we developed provide a step forward in describing the biophysics of HVC circuitry, and may throw a new light on certain dynamics in the mammalian brain, particularly the motor cortex and the hippocampus regions where precisely-timed sequential activity is crucial. We suggest that temporally-precise sequential activity may be a manifestation of neural networks comprised of chain of microcircuits, each containing pools of excitatory and inhibitory neurons, with local interplay among neurons of the same microcircuit and global interplays across the various microcircuits, and with structured inhibition as well as intrinsic properties synchronizing the neuronal pools and stabilizing timing within a firing sequence.

      Also missing is a discussion, or at least an acknowledgment, of the fact that not all of the fine experimental details of undershoots, latencies, spike structure, spike accommodation, etc may be relevant for understanding vocalization. While it is nice to know that some models can match these experimental details and produce realistic bursts, that does not mean that all of these details are relevant for the function of producing precise vocalizations. Scientific insights in biology often require exploring which of the many observed details can be ignored and especially identifying the few that are essential for answering some questions. As one example, if HVC-X neurons are completely removed from the authors' model, does one still get robust and reasonable burst propagation of HVC-RA neurons? While part of the nucleus HVC acts as a premotor circuit that drives the nucleus RA, part of HVC is also related to learning. It is not clear that HVC-X neurons, which carry out some unknown calculation and transmit information to area X in a learning pathway, are relevant for burst production and propagation of HVCRA neurons, and so relevant for vocalization. Simulations provide a convenient and direct way to explore questions of this kind.

      One key question to answer is whether the bursting of HVC-RA projection neurons is based on a mechanism local to HVC or is some combination of external driving (say from auditory nuclei) and local circuitry. The authors do not contribute to answering this question because they ignore external driving and assume that the mechanism is some kind of intrinsic feed-forward circuit, which they put in by hand in a rather arbitrary and poorly justified way, by assuming the existence of small microcircuits consisting of a few HVC-RA, HVC-X, and HVC-I neurons that somehow correspond to "sub-syllabic segments". To my knowledge, experiments do not suggest the existence of such microcircuits nor does theory suggest the need for such microcircuits. 

      Recent results showed a tight correlation between the intrinsic properties of neurons and features of song (Daou and Margoliash 2020, Medina and Margoliash 2024), where adult birds that exhibit similar songs tend to have similar intrinsic properties. While this is relevant, we acknowledge that not all details may be necessary for every aspect of vocalization, and future models could simplify concentrate on core dynamics and exclude certain features while still providing insights into the primary mechanisms.

      The question of whether HVC<sub>X</sub> neurons are relevant for burst propagation given that our model includes these neurons as part of the network for completeness, the reviewer is correct, the propagation of sequential activity in this model is primarily carried by HVC<sub>RA</sub> neurons in a feed-forward manner, but only if there is no perturbation to the HVC network. For example, we have shown how altering the intrinsic properties of HVC<sub>X</sub> neurons or for interneurons disrupts sequence propagation. In other words, while HVC neurons are the key forces to carry the chain forward, the interplay between excitation and inhibition in our network as well as the intrinsic parameters for all classes of HVC neurons are equally important forces in carrying the chain of activity forward. Thus, the stability of activity propagation necessary for song production depend on a finely balanced network of HVC neurons, with all classes contributing to the overall dynamics.

      We agree with the reviewer however that a potential drawback of our model is that its sole focus is on local excitatory connectivity within the HVC (Kornfeld et al., 2017; Long et al., 2010), while HVC neurons receive afferent excitatory connections (Akutagawa & Konishi, 2010; Nottebohm et al., 1982) that plays significant roles in their local dynamics. For example, the excitatory inputs that HVC neurons receive from Uvaeformis may be crucial in initiating (Andalman et al., 2011; Danish et al., 2017; Galvis et al., 2018) or sustaining (Hamaguchi et al., 2016) the sequential activity. While we acknowledge this limitation, our main contribution in this work is the biophysical insights onto how the patterning activity in HVC is largely shaped by the intrinsic properties of the individual neurons as well as the synaptic properties where excitation and inhibition play a major role in enabling neurons to generate their characteristic bursts during singing. This is true and holds irrespective of whether an external drive is injected onto the microcircuits or not. We elaborated on this further in the revised version in the Discussion.

      Another weakness of this paper is an unsatisfactory discussion of how the model was obtained, validated, and simulated. The authors should state as clearly as possible, in one location such as an appendix, what is the total number of independent parameters for the entire network and how parameter values were deduced from data or assigned by hand. With enough parameters and variables, many details can be fit arbitrarily accurately so researchers have to be careful to avoid overfitting. If parameter values were obtained by fitting to data, the authors should state clearly what the fitting algorithm was (some iterative nonlinear method, whose results can depend on the initial choice of parameters), what the error function used for fitting (sum of least squares?) was, and what data were used for the fitting.

      The authors should also state clearly the dynamical state of the network, the vector of quantities that evolve over time. (What is the dimension of that vector, which is also the number of ordinary differential equations that have to be integrated?) The authors do not mention what initial state was used to start the numerical integrations, whether transient dynamics were observed and what were their properties, or how the results depended on the choice of the initial state. The authors do not discuss how they determined that their model was programmed correctly (it is difficult to avoid typing errors when writing several pages or more of a code in any language) or how they determined the accuracy of the numerical integration method beyond fitting to experimental data, say by varying the time step size over some range or by comparing two different integration algorithms.

      We thank the reviewer again. The fitting process in our model occurred only at the first stage where the synaptic parameters were fit to the Mooney and Prather as well as the Kosche results. There was no data shared and we merely looked at the figures in those papers and checked the amplitude of the elicited currents, the magnitudes of DC-evoked excitations etc … and we replicated that in our model. While this is suboptimal, it was better for us to start with it rather than simply using equations for synaptic currents from the literature for other types of neurons (that are not even HVC’s or in the songbird) and integrate them into our network model. The number of ODEs that govern the dynamics of every model neuron is listed on page 10 of the manuscript as well as in the Appendix.  Moreover, we highlighted the details of this fitting process in the revised version.

      Also disappointing is that the authors do not make any predictions to test, except rather weak ones such as that varying a maximum conductance sufficiently (which might be possible by using dynamic clamps) might cause burst propagation to stop or change its properties. Based on their results, the authors do not make suggestions for further experiments or calculations, but they should.

      We agree that making experimental testable predictions is crucial for the advancement of the model. Our predictions include testing whether eradication of a class of neurons such as HVC<sub>X</sub> neurons disrupts activity propagation which can be done through targeted neuron elimination. This also can be done through preventing rebound bursting in HVC<sub>X</sub> by pharmacologically blocking the I<sub>H</sub> channels. Others include down regulation of certain ion channels (pharmacologically done through ion blockers) and testing which current is fundamental for song production (and there a plenty of test based our results, like the SK current, the T-type Ca<sup>2+</sup> current, the A-type K<sup>+</sup> current, etc…). We incorporated these into the Discussion of the revised manuscript to better demonstrate the model's applicability and to guide future research directions.

      Main issues:

      (1) Parameters are overly fine-tuned and often do not match known biology to generate chains. This fine-tuning does not reveal fundamental insights.

      (1a) Specific conductances (e.g. AMPA) are finely tweaked to generate bursts, in part due to a lack of a dendritic mechanism for burst generation. A dendritic mechanism likely reflects the true biology of HVC neurons.

      We acknowledge that the model does not include active dendritic processes and we do not regard this as a limitation. In fact, our present approach, although simplified, is intended to focus on somatic mechanisms to identify minimal conditions required for stable sequential propagation. We know HVC<sub>RA</sub> neurons possess thin, spiny dendrites which can contribute to burst initiation and shaping. Future models that include such nonlinear dendritic mechanisms would likely reduce the need for fine tuning of specific conductances at the soma and consequently better match the known biology of HVC<sub>RA</sub> neurons. 

      In text: “While our simplified, somatically driven architecture enables better exploration of mechanisms for sequence propagation, future extensions of the model will incorporate dendritic compartments to more accurately reflect the intrinsic bursting mechanisms observed in HVC<sub>RA</sub> neurons.”

      (1b) In this paper, microcircuits are simulated and then concatenated to make the HVC chain, resulting in no representations during silent gaps. This is out of touch with the known HVC function. There is no anatomical nor functional evidence for microcircuits of the kind discussed in this paper or in the earlier and rather similar paper by Eve Armstrong and Henry Abarbanel (J. Neurophy 2016). One can write a large number of papers in which one makes arbitrary unconstrained guesses of network structure in HVC and, unless they reveal some novel principle or surprising detail, they are all going to be weak.

      Although the model is composed of sequentially activated microcircuits, the gaps between each microcircuit’s output do not represent complete silence in the network. During these periods, other neurons such as those in other microcircuits may still exhibit bursting activity. Thus, what may appear as a 'silent gap' from the perspective of a given output microcircuit is, in fact, part of the ongoing background dynamics of the larger HVC neuron network. We fully acknowledge the reviewer's point that there is no direct anatomical or physiological evidence supporting the presence of microcircuits with this structure in HVC. Our intention was not to propose the existence of such a physical model but to use it as a computational simplification to make precise sequential bursting activity feasible given the biologically realistic neuronal dynamics used. Hence, our use of 'microcircuits' refers to a modeling construct rather than a structural hypothesis. Even if the network topology is hypothetical, we still believe that the temporal structuring suggested allows us to generate specific predictions for future work about burst timing and neuronal connections.

      (1c) HVC interneuron discharge in the author's model is overly precise; addressing the observation that these neurons can exhibit noisy discharge. Real HVC interneurons are noisy. This issue is critical: All reviewers strongly recommend that the authors should, at the minimum in a revision, focus on incorporating HVC-I noise in their model.

      We agree that capturing the variability in interneuron bursting is critical for biological realism. In our model, HVC interneurons receive stochastic background current that introduces variability in their firing patterns as observed in vivo. This variability is seen in our simulations and produces more biologically realistic dynamics while maintaining sequence propagation. We clarify this implementation in the Methods section. 

      (1d) Address the finding that Kosche et al show that even with reduced inhibition, HVCra neuronal timing is preserved; it is the burst pattern that is affected.

      The differences between the Kosche et al. (2015) findings and the predictions of our model arise from differences in the aspect of HVC function we are modeling. Our model is more sensitive to inhibition, which is a designed mechanism for achieving precise song patterning. This is a modeling simplification we adopted to capture specific characteristics of HVC function. 

      We acknowledged this point in the discussion: “While findings of Kosche et al. (2015) emphasize the robustness of the HVC timing circuit to inhibition, our model is more sensitive to inhibition, highlighting that HVC likely operates with several, redundant mechanisms that overall ensure temporal precision.”

      (1e) The real HVC is robust to microlesions, cooling, and HVCra neuron turnover. The model in this paper relies on precise HVCra connectivity and is not robust.

      Although our model is grounded in the biologically observed behavior of HVC neurons in vivo, we don’t claim that it fully captures the resilience seen in the HVC network. Instead, we see this as a simplified framework that helps us explore the basic principles of sequential activity. In the future, adding features like recurrent excitation, synaptic plasticity, or homeostatic mechanisms could make the model more robust.

      (1f) There is unclear motivation for Ih-driven HVCx bursting, given past findings from the Mooney group.

      Daou et al (2013) noticed that the observed in HVC<sub>X</sub> and HVC<sub>INT</sub> neurons in response to hyperpolarizing current pulses (Dutar et al. 1998; Kubota and Saito 1991; Kubota and Taniguchi 1998) was completely abolished after the application of the drug ZD 7288 in all of the neurons tested indicating that the sag in these HVC neurons is due to the hyperpolarization-activated inward current (I<sub>h</sub>). in addition, the sag and the rebound seen in these two neuron groups were larger as for larger hyperpolarization current pulses.

      (1g) The initial conditions of the network and its activity under those conditions, as well as the possible reliance on external inputs, are not defined.

      In our model, network activity is initiated through a brief, stochastic excitatory input to a small HVC<sub>RA</sub> neuron of one microcircuit. This drive represents a simplified version of external input from upstream brain regions known to project to HVC, such as nuclei in the high vocal center's auditory pathways such as Nif and Uva. Modeling the activity of these upstream regions and their influence on HVC dynamics is an ongoing research work to be published in the future.

      (1h) It has been known from the time of Hodgkin and Huxley how to include temperature dependences for neuronal dynamics so another suggestion is for the authors to add such dependences for the three classes of neurons and see if their simulation causes burst frequencies to speed up or slow down as T is varied.

      We added this as limitation to the discussion section: “Our model was run at a fixed physiological temperature, but it's well known going all the way back to Hodgkin and Huxley that both ion channel activity and synaptic dynamics can change with temperature. In future work, adding temperature scaling (like Q10 factors) could help us explore how burst timing and sequence speed change with temperature changes, and how neural activity in HVC would/would not preserve its precision under different physiological conditions.”

      (2) The scope of the paper and its objectives must be clearly defined. Defining the scope and providing caveats for what is not considered will help the reader contextualize this study with other work.

      (2a) The paper does not consider the role of external inputs to HVC, which are very likely important for the capacity of the HVC chain to tile the entire song, including silent gaps.

      The role of afferent input to HVC particularly from nuclei such as Uva and Nif is critical in shaping the timing and initiation of HVC sequences throughout the song, including silent intervals. In fact, external inputs are likely involved in more than just triggering sequences, they may also influence the continuity of activity across motifs. However, in this study, we chose to focus on the intrinsic dynamics of HVC as a step toward understanding the internal mechanisms required for generating temporally precise sequences and for this reason, we used a simplified external input only to initiate activity in the chain.

      (2b) The paper does not consider important dendritic mechanisms that almost certainly facilitate the all-or-none bursting behavior of HVC projection neurons. the authors need to mention and discuss that current-clamped neuronal response - in which an electrode is inserted into the soma and then a constant current-step is applied - bypasses dendritic structure and dendritic processing and so is an incomplete way to characterize a neuron's properties. In particular, claiming to fit current-clamp data accurately and then claiming that one now has a biophysically accurate network model, as the authors do, is greatly misleading.

      While we addressed this is 1a, we do not suggest that our model is a fully accurate biophysical representation of HVC network. Instead, we see it as a simplified framework that helps reveal how much of HVC’s sequential activity can be explained by somatic properties and synaptic interactions alone. However, additional biological mechanisms, like dendritic processing, are likely to play an important role and should be explored in future work.

      (2c) The introduction does not provide a clear motivation for the paper - what hypotheses are being tested? What is at stake in the model outcomes? It is not inherently informative to take a known biological representation and fine-tune a limited model to replicate that representation.

      We explicitly added the hypotheses to the revised introduction.

      (2d) There have been several published modeling efforts applied to the HVC chain (Seung, Fee, Long, Greenside, Jin, Margoliash, Abarbanel). These and others need to be introduced adequately, and it needs to be crystal clear what, if anything, the present study is adding to the canon.

      While several influential models have explored how HVC might generate sequences ranging from synfire chains to recurrent dynamics or externally driven sequences (e.g., Seung, Fee, Long, Greenside, Jin, Abarbanel, and others), these models could not capture the detailed dynamics observed in vivo. Our aim was to bridge a gap in the modeling literature by exploring how far biophysically grounded intrinsic properties and experimentally supported synaptic connections that are local to the HVC can alone produce temporally precise sequences. We have proven that these mechanisms are sufficient to generate these sequences, although some missing components (such as dendritic mechanisms or external inputs) might be needed to fully capture the complexity and robustness of HVC function.

      (2e) The authors mention learning prominently in the abstract, summary, and introduction but this paper has nothing to do with learning. Most or all mentions of learning should be deleted since they are misleading.

      We appreciate the reviewer’s observation however our intent by referencing learning was not to suggest that our model directly simulates learning processes, but rather to place HVC function within the broader context of song learning and production, where temporal sequencing plays a fundamental role. Yet, repeated references to learning may be misleading given that our current model does not incorporate plasticity, synaptic modification, or developmental changes. Hence, we have carefully revised the manuscript to rephrase mentions of learning unless directly relevant to context. 

      (3) Using the model for hypothesis generation and prediction of experimental results.

      (3a) The utility of a model is to provide conceptual insight into how or why the real HVC functions as it does, or to predict outcomes in yet-to-be conducted experiments to help motivate future studies. This paper does not adequately achieve these goals.

      We revised the Discussion of the manuscript to better emphasize potential contributions and point out many experiments that could validate or challenge the model’s predictions. These include dynamic clamp or ion channel blockers targeting A-type K<sup>+</sup> in HVC<sub>RA</sub> neurons to assess their impact on burst precision, optogenetic disruption of inhibitory interneurons to observe changes in burst timing and sequence propagation, pharmacological modulation of I<sub>h</sub> or I<sub>CaT</sub> in HVC<sub>X</sub> and interneurons etc. 

      (3b) Additionally, it can be interesting to conduct an experiment on an existing model; for example, what happens to the HVCra chain in your model if you delete the HVCx neurons? What happens if you block NMDA receptors? Such an approach in a modeling paper can help motivate hypotheses and endow the paper with a sense of purpose.

      We agree that running targeted experiments to test our computational model such as removing an HVC neuron population or blocking a synaptic receptor can be a powerful way to generate new ideas and guide future experiments. While we didn’t include these specific tests in the current study, the model is well suited for this kind of exploration. For instance, removing interneurons could help us better understand their role in shaping the timing of HVC<sub>RA</sub> bursts. These are great directions for future experiments, and we now highlight this in the discussion as a way the model could be used to guide experiments.

      (4) Changes to the paper's organization may improve clarity.

      (4a) Nearly all equations should be moved to an Appendix so that the main part of the paper can focus on the science: assumptions made, details of simulations, conclusions obtained, and their significance. The authors present many equations without discussion which weakens the paper.

      Equations moved to appendix.

      (4b) There are many grammatical errors, e.g., verbs do not match the subject in terms of being single or plural. The authors need to run their manuscript through a grammar checker.

      Done.

      (4c) Many of the figures are poorly designed and should be substantially modified. E.g. in Figure 1B, too many colors are used, making it hard to grasp what is being plotted and the colors are not needed. Figures 1C and 1D are entire figures taken from other papers, and there is no way a reader will be able to see or appreciate all the details when this figure is published on a single page. Figure 2 uses colors for dots that are almost identical, and the colors could be avoided by using different symbols. Figure 5 fills an entire page but most of the figure conveys no information, there is no need to show the same details for all 120 neurons, just show the top 1/3 of this figure; the same for Figure 7, a lot of unnecessary information is being included. Figure 10, the bottom time series of spikes should be replaced with a time series of rates, cannot extract useful information.

      Adjusted as requested. 

      (4d) Table 1 is long and largely uninteresting, and should be moved to an appendix.

      Table 1 moved to appendix.

      (4e) Many sentences are not carefully written, which greatly weakens the paper. As one typical example, the first sentence in the Discussion section "In this study, we have designed a neural network model that describes [sic] zebra finch song production in the HVC." This is inaccurate, the model does not describe song production, it just explores some properties of one nucleus involved with song production. Just one or few sentences like this is ok but there are so many sentences of this kind that the reader loses faith in the authors.

      Thank you for raising this point, we revised the manuscript to improve the precision of the writing. We replaced the first sentence of the discussion with this: "In this study, we developed a biophysically realistic neural network model to explore how intrinsic neuronal properties and local connectivity within the songbird nucleus HVC may support the generation of temporally precise activity sequences associated with zebra finch song."

    1. eLife Assessment

      This is a valuable analysis of STORM data that characterizes the clustering of active zones in retinogeniculate terminals across ages and in the absence of retinal waves. The design makes it possible to relate fixed time point structural data to a known outcome of activity-dependent remodeling. The latest revision has tempered the causal claims made in previous versions. The result provides solid structural support for the hypotheses regarding how activity influences the clustering of these synapses.

    2. Joint Public Review:

      Summary:

      The authors previously published a study of RGC boutons in the dLGN in developing wild-type mice and developing mutant mice with disrupted spontaneous activity. In the current manuscript, they have broken down their analysis of RGC boutons according to the number of Homer/Bassoon puncta associated with each vGlut3 cluster.

      The authors find that, in the first post-natal week, RGC boutons with multiple active zones (mAZs) are about a third as common as boutons with a single active zone (sAZ). The size of the vGluT2 cluster associated with each bouton was proportional to the number of active zones present in each bouton. Within the author's ability to estimate these values (n=3 per group, 95% of results expected to be within ~2.5 standard deviations), these results are consistent across groups: 1) dominant eye vs. non-dominant eye, 2) wild-type mice vs. mice with activity blocked, and at 3) ages P2, P4, and P8. The authors also found that mAZs and sAZs also have roughly the same number (about 1.5) of sAZs clustered around them (within 1.5 um).

      There has been much discussion with the reviewers through multiple versions of this paper. of how to interpret these findings. Based on a large number of tests for statistical significance, the authors interpreted the presence of a statistical significance difference as evidence that "Eye-specific active zone clustering underlies synaptic competition in the developing visual system (title of previous version of manuscript)". The reviewers have focused on the small effect size as indicating that the small differences observed are not informative regarding this biological question. The authors have now tempered this interpretation.

      Strengths:

      The source dataset is high resolution data showing the colocalization of multiple synaptic proteins across development. Added to this data is labeling that distinguishes axons from the right eye from axons from the left eye. The first order analysis of this data showing changes in synapse density and in the occurrence of multi-active zone synapses is useful information about the development of an important model for activity dependent synaptic remodeling.

      Reviewing Editor's comment on the latest revision (without sending the paper back to the individual reviewers):

      In their latest revision, the authors have moderated earlier causal claims, incorporated additional statistical controls, and largely maintained their original interpretation of the data. While these changes address some prior concerns, the underlying issues remain. The previous review emphasized that the reported effect sizes were small and therefore hard to link to biological relevance. The authors argue that the effect sizes are large. Given the lack of a biological argument for this effect size, this point is really semantic. We would like to point out that the effect size measurement the authors used is likely a standard effect size calculation (the difference between groups is divided by the standard deviation of the groups). With only three experiments and irregular variance, it is likely that their estimates of standard deviation-and therefore effect size-are unreliable. Overall, the revisions improve presentation but do not substantively resolve the difficulty in drawing strong conclusions from the data set raised earlier.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary

      The authors previously published a study of RGC boutons in the dLGN in developing wild-type mice and developing mutant mice with disrupted spontaneous activity. In the current manuscript, they have broken down their analysis of RGC boutons according to the number of Homer/Bassoon puncta associated with each vGlut3 cluster.

      The authors find that, in the first post-natal week, RGC boutons with multiple active zones (mAZs) are about a third as common as boutons with a single active zone (sAZ). The size of the vGluT2 cluster associated with each bouton was proportional to the number of active zones present in each bouton. Within the author's ability to estimate these values (n=3 per group, 95% of results expected to be within ~2.5 standard deviations), these results are consistent across groups: 1) dominant eye vs. nondominant eye, 2) wild-type mice vs. mice with activity blocked, and at 3) ages P2, P4, and P8. The authors also found that mAZs and sAZs also have roughly the same number (about 1.5) of sAZs clustered around them (within 1.5 um).

      However, the authors do not interpret this consistency between groups as evidence that active zone clustering is not a specific marker or driver of activity dependent synaptic segregation. Rather, the authors perform a large number of tests for statistical significance and cite the presence or absence of statistical significance as evidence that "Eye-specific active zone clustering underlies synaptic competition in the developing visual system (title)". I don't believe this conclusion is supported by the evidence.

      We have revised the title to be descriptive: "Eye-specific differences in active zone addition during synaptic competition in the developing visual system." While our correlative approach does not establish direct causality, our findings provide important structural evidence that complements existing functional studies of activity-dependent synaptic refinement. We have carefully revised the text throughout to avoid causal language, focusing instead on the developmental patterns we observe.

      Strengths

      The source dataset is high resolution data showing the colocalization of multiple synaptic proteins across development. Added to this data is labeling that distinguishes axons from the right eye from axons from the left eye. The first order analysis of this data showing changes in synapse density and in the occurrence of multi-active zone synapses is useful information about the development of an important model for activity dependent synaptic remodeling.

      Weaknesses

      In my previous review I argued that it was not possible to determine, from their analysis, whether the differences they were reporting between groups was important to the biology of the system. The authors have made some changes to their statistics (paired t-tests) and use some less derived measures of clustering. However, they still fail to present a meaningfully quantitative argument that the observed group differences are important. The authors base most of their claims on small differences between groups. There are two big problems with this practice. First, the differences between groups appear too small to be biologically important. Second, the differences between groups that are used as evidence for how the biology works are generally smaller than the precision of the author's sampling. That is, the differences are as likely to be false positives as true positives.

      (1) Effect size. The title claims: "Eye-specific active zone clustering underlies synaptic competition in the developing visual system". Such a claim might be supported if the authors found that mAZs are only found in dominant-eye RGCs and that eye-specific segregation doesn't begin until some threshold of mAZ frequency is reached. Instead, the behavior of mAZs is roughly the same across all conditions. For example, the clear trend in Figure 4C and D is that measures of clustering between mAZ and sAZ are as similar as could reasonably be expected by the experimental design. However, some of the comparisons of very similar values produced p-values < 0.05. The authors use this fact to argue that the negligible differences between mAZ and sAZs explain the development of the dramatic differences in the distribution of ipsilateral and contralateral RGCs.

      We have changed the title to avoid implying a causal relationship between clustering and eye-specific segregation. Our key findings in Figures 4C and 4D demonstrate effect sizes >2.0 with high statistical power (Supplemental Table S2). While the absolute magnitude of differences is modest (5-7%), these high effect sizes combined with low inter-animal variability demonstrate consistent, reproducible biological phenomena. During development, small differences during critical periods can have profound downstream consequences for synaptic refinement outcomes.

      We acknowledge that significance in Figure 4 arises due to low variance between biological replicates rather than large mean differences. We have revised the text to describe these as "slight" differences and that "WT mice show a tendency toward forming more synapses near mAZ inputs," reflecting appropriate caution in our interpretation while maintaining the statistical robustness of our findings.

      (2) Sample size. Performing a large number of significance tests and comparing pvalues is not hypothesis testing and is not descriptive science. At best, with large sample sizes and controls for multiple tests, this approach could be considered exploratory. With n=3 for each group, many comparisons of many derived measures, among many groups, and no control for multiple testing, this approach constitutes a random result generator.

      The authors argue that n=3 is a large sample size for the type of high resolution / large volume data being used. It is true that many electron microscopy studies with n=1 are used to reveal the patterns of organization that are possible within an individual. However, such studies cannot control individual variation and are, therefore, not appropriate for identifying subtle differences between groups.

      In response to previous critiques along these lines, the authors argue they have dealt with this issue by limiting their analysis to within-individual paired comparisons. There are several problems with their thinking in this approach. The main problem is that they did not change the logic of their arguments, only which direction they pointed the t-tests. Instead of claiming that two groups are different because p < 0.05, they say that two groups are different because one produced p < 0.05 and the other produced p > 0.05. These arguments are not statistically valid or biologically meaningful.

      We have implemented rigorous statistical controls, applying false discovery rate (FDR) correction using the Benjamini-Hochberg method (α = 0.05) within each experimental condition (age × genotype combination). This correction strategy treats each condition as addressing a distinct experimental question: “What synaptic properties differ between left eye and right eye inputs in this specific developmental stage and genotype?” The approach appropriately controls for multiple testing while preserving power to detect biologically meaningful differences. We applied FDR correction separately to the ~20-34 measurements (varying by age and genotype) within each of the six experimental conditions, resulting in condition-specific adjusted p-values reported in updated Supplemental Table S2. This correction confirmed the robustness of our key findings. We do not base conclusions solely on comparing p-values across conditions. Our interpretations focus on effect sizes, confidence intervals, and consistent patterns within each condition, with statistical significance providing supporting evidence rather than the primary basis for biological conclusions.

      To the best of my understanding, the results are consistent with the following model:

      RGCs form mAZs at large boutons (known)

      About a quarter of week-one RGC boutons are mAZs (new observation)

      Vesicle clustering is proportional to active zone number (~new observation)

      RGC synapse density increases during the first post-week (known)

      Blocking activity reduces synapse density (known)

      Contralateral eye RGCs for more and larger synapses in the lateral dLGN (known)

      While mAZ formation is known in adult and juvenile dLGN, the formation of mAZ boutons during eye-specific competition represents new information with important functional implications. Synapses with multiple release sites should be stronger than single-active-zone synapses, suggesting a structural correlate for competitive advantage during refinement.

      We demonstrate distinct developmental patterns for sAZ versus mAZ contacts during the first postnatal week. Multi-active zone density favors the dominant eye, while single active-zone synapse density from the competing eye increases from P2-P4 to match dominant-eye levels. This reveals that newly formed synapses from the competing eye predominantly contain single release sites, marking P4-P8 as a critical window for understanding molecular mechanisms driving synaptic elimination.

      Our results show that altered retinal activity patterns (β2KO mice) reduce synapse density during eye-specific competition. We relied on β2 knockout mice, which retain retinal waves and spontaneous spike activity but with disrupted patterns and output levels compared to controls. We make no claims about complete activity blockade. Previous studies using different activity manipulations (epibatidine, TTX) have examined terminal morphology, but effects on synapse density during competition remain largely unknown. Achieving complete retinal activity blockade is technically challenging, making it of interest to revisit the role of activity using more precise manipulations to control spike output and relative timing.

      With n=3 and effect sizes smaller than 1 standard deviation, a statistically significant result is about as likely to be a false positive as a true positive.

      A true-positive statistically significant result does is not evidence of a meaningful deviation from a biological model.

      Our conclusions are based on results with effect sizes substantially larger than 1. Key findings demonstrate effect sizes exceeding 2.0. These large effect sizes, combined with rigorous FDR correction and low inter-animal variability, provide evidence against false positive results. During critical developmental periods, consistent structural differences, even those modest in absolute magnitude, can reflect important regulatory mechanisms that influence refinement outcomes. All statistical results, effect sizes, and power analyses are reported in Supplementary Tables S2, with confidence intervals in Supplementary Table S3. We have revised the text in several places where small differences are presented to reflect appropriate caution in our interpretation.

      Providing plots that show the number of active zones present in boutons across these various conditions is useful. However, I could find no compelling deviation from the above default predictions that would influence how I see the role of mAZs in activity dependent eye-specific segregation.

      Below are critiques of most of the claims of the manuscript.

      Claim (abstract): individual retinogeniculate boutons begin forming multiple nearby presynaptic active zones during the first postnatal week.

      Confirmed by data.

      Claim (abstract): the dominant-eye forms more numerous mAZ contacts,

      Misleading: The dominant-eye (by definition) forms more contacts than the nondominant eye. That includes mAZ.

      While the dominant eye forms more total contacts, the pattern depends critically on contact type and developmental stage. The dominant eye forms more mAZ contacts across all ages (Figures 2 and S1). However, for sAZ contacts, the two eyes form similar numbers at P4, with the non-dominant eye showing increased sAZ formation during this critical period. This differential pattern by synapse type represents an important aspect of how synaptic competition unfolds structurally.

      Claim (abstract): At the height of competition, the non-dominant-eye projection adds many single active zone (sAZ) synapses

      Weak: While the individual observation is strong, it is a surprising deviation based on a single n=3 experiment in a study that performed twelve such experiments (six ages, mutant/wildtype, sAZ/mAZ)

      The difference in eye-specific sAZ formation at P2 and P8 had effect sizes of ~5.3 and ~2.7 respectively (after FDR correction the difference was still significant at P2 and trending at P8). At P4, no effect was observed by paired T-test and the 5/95% confidence intervals ranged from -0.021-0.008 synapses/m<sup>3</sup>. The consistency of this pattern across P2 and P8, combined with the large effect sizes, supports the reliability of this developmental finding. We report all effect sizes and power test analyses in Supplemental Table S2, and confidence intervals in Supplemental Table S3. 

      Claim (abstract): Together, these findings reveal eye-specific differences in release site addition during synaptic competition in circuits essential for visual perception and behavior.

      False: This claim is unambiguously false. The above findings, even if true, do not argue for any functional significance to active zone clustering.

      Our phrasing “circuits essential for visual perception and behavior” referred to the general importance of binocular organization in the retinogeniculate system for visual processing and we did not intend to claim direct functional significance of our structural data. For clarity we have deleted the latter part of this sentence. In lines 35-37, the abstract now reads “Together, these findings reveal eye-specific differences in release site addition that correlate with axonal refinement outcomes during retinogeniculate refinement.”

      Claim (line 84): "At the peak of synaptic competition midway through the first postnatal week, the non-dominant-eye formed numerous sAZ inputs, equalizing the global synapse density between the two eyes"

      Weak: At one of twelve measures (age, bouton type, genotype) performed with 3 mice each, one density measure was about twice as high as expected.

      The difference in eye-specific sAZ formation at P2 and P8 had effect sizes of ~5.3 and ~2.7 respectively (after FDR correction the difference was still significant at P2 and trending at P8). At P4, no effect was observed by paired T-test and the 5/95% confidence intervals ranged from -0.021-0.008 synapses/m<sup>3</sup>. The consistency of this pattern across P2 and P8, combined with the large effect sizes, supports the reliability of this developmental finding. We report all effect sizes and power test analyses in Supplemental Table S2, and confidence intervals in Supplemental Table S3. 

      Claim (line 172): "In WT mice, both mAZ (Fig. 3A, left) and sAZ (Fig. 3B, left) inputs showed significant eye-specific volume differences at each age."

      Questionable: There appears to be a trend, but the size and consistency is unclear.

      Claim (line 175): "the median VGluT2 cluster volume in dominant-eye mAZ inputs was 3.72 fold larger than that of non-dominant-eye inputs (Fig. 3A, left)."

      Cherry picking. Twelve differences were measured with an n of 3, 3 each time. The biggest difference of the group was cited. No analysis is provided for the range of uncertainty about this measure (2.5 standard deviations) as an individual sample or as one of twelve comparisons.

      Claim (line 174): "In the middle of eye-specific competition at P4 in WT mice, the median VGluT2 cluster volume in dominant-eye mAZ inputs was 3.72 fold larger than that of non-dominant-eye inputs (Fig. 3A, left). In contrast, β2KO mice showed a smaller 1.1 fold difference at the same age (Fig. 3A, right panel). For sAZ synapses at P4, the magnitudes of eye-specific differences in VGluT2 volume were smaller: 1.35-fold in WT (Fig. 3B, left) and 0.41-fold in β2KO mice (Fig. 3B, right). Thus, both mAZ and sAZ input size favors the dominant eye, with larger eye-specific differences seen in WT mice (see Table S3)."

      No way to judge the reliability of the analysis and trivial conclusion: To analyze effect size the authors choose the median value of three measures (whatever the middle value is). They then make four comparisons at the time point where they observed the biggest difference in favor of their hypothesis. There is no way to determine how much we should trust these numbers besides spending time with the mislabeled scatter plots. The authors then claim that this analysis provides evidence that there is a difference in vGluT2 cluster volume between dominant and non-dominant RGCs and that that difference is activity dependent. The conclusion that dominant axons have bigger boutons and that mutants that lack the property that would drive segregation would show less of a difference is very consistent with the literature. Moreover, there is no context provided about what 1.35 or 1.1 fold difference means for the biology of the system.

      We focused on P4 for biological reasons rather than post-hoc selection. P4 represents the established peak of synaptic competition when eye-specific synapse densities are globally equivalent. This is a timepoint consistently highlighted throughout our manuscript and supported by previous literature. We have modified our presentation from fold changes to measured eye-specific differences in volume (mean ± standard error) and added confidence intervals in Supplemental Table S3. The effect sizes for eye-specific differences in VGluT2 volume at P4 are robust: ~2.3 and ~1.5 for mAZ and sAZ measurements in WT mice, and ~2.5 and ~1.8 in β2KO mice, with all analyses well-powered (Supplemental Table S2).

      We were unable to identify any mislabeled scatter plots and believe all figures are correctly labeled. While dominant-eye advantage in bouton size is consistent with previous literature, our study provides the first detailed analysis of how this develops specifically during the critical period of competition, with distinct patterns for single versus multi-active zone contacts. Our data show that dominant-eye inputs have larger vesicle pools that scale with active zone number. While this suggests enhanced transmission capacity, we make no direct physiological claims based on structural data alone.

      Claim (189): "This shows that vesicle docking at release sites favors the dominant-eye as we previously reported but is similar for like eye type inputs regardless of AZ number."

      Contradicts core claim of manuscript: Consistent with previous literature, there is an activity dependent relative increase in vGlut2 clustering of dominant eye RGCs. The new information is that that activity dependence is more or less the same in sAZ and mAZ. The only plausible alternative is that vGlut2 scaling only increases in mAZ which would be consistent with the claims of their paper. That is not what they found. To the extent that the analysis presented in this manuscript tests a hypothesis, this is it. The claim of the title has been refuted by figure 3.

      We report the volume of docked vesicle signal (VGluT2) nearby each active zone, finding this is greater for dominant-eye synapses. Within each eye-specific synapse population, vesicle signal per active zone is similar regardless of whether these are part of single- or multi-active zone contacts. This is consistent with a modular program of active zone assembly and maintenance: core molecular programs facilitate docking at each AZ similarly regardless of how many AZs are nearby. 

      This finding does not contradict our main conclusions but rather provides insight into how synaptic advantages are structured. The dominant eye's advantage may arise in part from forming more multi-AZ contacts (which have proportionally more docked vesicles) rather than from enhanced vesicle loading per individual active zone. This organization may reflect how developmental competition operates through contact number and active zone addition rather than fundamental changes to individual release site properties.

      We have changed the title to be descriptive rather than mechanistic.

      Claim (line 235): "For the non-dominant eye projection, however, clustered mAZ inputs outnumbered clustered sAZ inputs at P4 (Fig. 4C, bottom left panel), the age when this eye adds sAZ synapses (Fig. 2C)."

      Misleading: The overwhelming trend across 24 comparisons is that the sAZ clustering looks like mAZ clustering. That is the objective and unambiguous result. Among these 24 underpowered tests (n=3), there were a few p-values < 0.05. The authors base their interpretation of cell behavior on these crossings.

      In Figures 4C and 4D we report significant results with high effect sizes (effect sizes all greater than 2; see Supplemental Table S2). The mean differences are modest (5-7%) and significance arises due to low variance between biological replicates. We acknowledge that clustering patterns are generally similar between mAZ and sAZ inputs across most conditions. We have revised the text to describe these as “slight” differences and that “WT mice show a tendency toward forming more synapses near mAZ inputs”, reflecting appropriate caution in our interpretation while noting the statistical consistency of these patterns.

      Claim (line 328): "The failure to add synapses reduced synaptic clustering and more inputs formed in isolation in the mutants compared to controls."

      Trivially true: Density was lower in mutant.

      We have rewritten the sentence for clarity: “The failure to add synapses could explain the observation that synaptic clustering was reduced and more inputs formed in isolation in the mutants compared to controls.”

      Claim (line 332): "While our findings support a role for spontaneous retinal activity in presynaptic release site addition and clustering..."

      Not meaningfully supported by evidence: I could not find meaningful differences between WT and mutant beside the already known dramatic difference in synapse density.

      We have changed the sentence to avoid overinterpreting the results. The new sentence in lines 415-417 reads: “While our results highlight developmental changes in presynaptic release site addition and clustering, activity-dependent postsynaptic mechanisms also influence input refinement at later stages.”

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Zhang and Speer examine changes in the spatial organization of synaptic proteins during eye specific segregation, a developmental period when axons from the two eyes initially mingle and gradually segregate into eye-specific regions of the dorsal lateral geniculate. The authors use STORM microscopy and immunostain presynaptic (VGluT2, Bassoon) and postsynaptic (Homer) proteins to identify synaptic release sites. Activity-dependent changes of this spatial organization are identified by comparing the β2KO mice to WT mice. They describe two types of synapses based on Bassoon clustering: the multiple active zone (mAZ) synapse and single active zone (sAZ) synapse. In this revision, the authors have added EM data to support the idea that mAZ synapses represent boutons with multiple release sites. They have also reanalyzed their data set with different statistical approaches.

      Strengths:

      The data presented is of good quality and provides an unprecedented view at high resolution of the presynaptic components of the retinogeniculate synapse during active developmental remodeling. This approach offers an advance to the previous mouse EM studies of this synapse because of the CTB label allows identification of the eye from which the presynaptic terminal arises.

      Weaknesses:

      While the interpretation of this data set is much more grounded in this second revised submission, some of the authors' conclusions/statements still lack convincing supporting evidence. In particular, the data does not support the title: "Eye-specific active zone clustering underlies synaptic competition in the developing visual system". The data show that there are fewer synapses made for both contra- and ipsi- inputs in the β2KO mice-- this fact alone can account for the differences in clustering. There is no evidence linking clustering to synaptic competition. Moreover, the findings of differences in AZ# or distance between AZs that the authors report are quite small and it is not clear whether they are functionally meaningful.

      We thank the reviewer for their helpful suggestions that improved the manuscript in this revision. We have changed the title to remove the reference to “clustering” and to avoid implying any causal relationships. The new title is descriptive: “Eye-specific differences in active zone addition during synaptic competition in the developing visual system”.

      To further address the reviewers comments, we have removed the remaining references to activity-dependent effects on synaptic development (line 36, line 96, line 415). We have also modified the text in lines 411-413 to state that “The failure to add synapses could explain the observation that synaptic clustering was reduced and more inputs formed in isolation in the mutants compared to controls.”

      We have also updated our presentation of results for Figure 4 to ensure that we do not causally link clustering to synaptic competition. In Figures 4C and 4D we report significant results with high effect sizes (effect sizes all greater than 2; see Supplemental Table S2). The mean differences are modest (5-7%) and significance arises due to low variance between biological replicates. We acknowledge that clustering patterns are generally similar between mAZ and sAZ inputs across most conditions. We have revised the text to describe these as “slight” differences and that “WT mice show a tendency toward forming more synapses near mAZ inputs”, reflecting appropriate caution in our interpretation while noting the statistical consistency of these patterns.

      Reviewer #3 (Public review):

      This study is a follow-up to a recent study of synaptic development based on a powerful data set that combines anterograde labeling, immunofluorescence labeling of synaptic proteins, and STORM imaging (Cell Reports, 2023). Specifically, they use anti-Vglut2 label to determine the size of the presynaptic structure (which they describe as the vesicle pool size), anti-Bassoon to label active zones with the resolution to count them, and anti-Homer to identify postsynaptic densities. Their previous study compared the detailed synaptic structure across the development of synapses made with contraprojecting vs. ipsi-projecting RGCs and compared this developmental profile with a mouse model with reduced retinal waves. In this study, they produce a new detailed analysis on the same data set in which they classify synapses into "multi-active zone" vs. "single-active zone" synapses and assess the number and spacing of these synapses. The authors use measurements to make conclusions about the role of retinal waves in the generation of same-eye synaptic clusters. The authors interpret these results as providing insight into how neural activity drives synapse maturation, the strength of their conclusions is not directly tested by their analysis.

      Strengths:

      This is a fantastic data set for describing the structural details of synapse development in a part of the brain undergoing activity-dependent synaptic rearrangements. The fact that they can differentiate the eye of origin is what makes this data set unique over previous structural work. The addition of example images from the EM dataset provides confidence in their categorization scheme.

      Weaknesses:

      Though the descriptions of single vs multi-active zone synapses are important and represent a significant advance, the authors continue to make unsupported conclusions regarding the biological processes driving these changes. Although this revision includes additional information about the populations tested and the tests conducted, the authors do not address the issue raised by previous reviews. Specifically, they provide no assessment of what effect size represents a biologically meaningful result. For example, a more appropriate title is "The distribution of eye-specific single vs multiactive zone is altered in mice with reduced spontaneous activity" rather than concluding that this difference in clustering is somehow related to synaptic competition. Of course, the authors are free to speculate, but many of the conclusions of the paper are not supported by their results.

      We appreciate the reviewer’s helpful critique. We have changed the title to be descriptive and avoid implying causal relationships. 

      We have applied false discovery rate (FDR) correction using the Benjamini-Hochberg method with α = 0.05 within each experimental condition (age × genotype combination). The FDR correction treats each condition as addressing a distinct experimental question: 'What synaptic properties differ between left eye and right eye inputs in this specific developmental stage and genotype?'

      This correction strategy is appropriate because: 1) we focus our statistical comparisons within each age/genotype; 2) each age-genotype combination represents a separate biological context where different synaptic properties between eye-of-origin may be relevant; and 3) this approach controls for multiple testing within each experimental question while maintaining statistical power to detect meaningful biological differences.

      We applied FDR correction separately to the ~20-34 measurements (varying with age and genotype) within each of the six experimental conditions (P2-WT, P2-ß2, P4-WT, P4-ß2, P8-WT, P8-ß2), resulting in condition-specific adjusted p-values. These are reported in the updated Supplemental Table S2. Figures have been also been updated to reflect the FDR-adjusted values. Selected between-genotype comparisons are presented descriptively using 5/95% confidence intervals. This correction confirmed the robustness of our key findings.

      With regard to the biological significance of effect sizes, our key findings demonstrate effect sizes >2.0, indicating robust effects. During critical developmental periods, consistent structural differences, even those modest in absolute magnitude, can reflect important regulatory mechanisms that influence refinement outcomes. The differences in synaptic organization we observe occur during the first postnatal week when eyespecific competition is active, suggesting these patterns may be relevant to understanding how structural advantages emerge during synaptic refinement.

      Reviewer #1 (Recommendations for the authors):

      I have tried to understand the analysis and biology of this manuscript as best I can. I believe the analytical approach taken is not reliable and I have explained why in my public comments. I don't believe this manuscript is unique in taking this approach. I have recently published a paper on how common this approach is and why it doesn't work. I don't want to give the impression that the problem with the analysis was that it was not computationally sophisticated enough or that you did not jump through a specific statistical hoop. If I strip out the arguments that depend on misinterpretations of p-values and -instead- look at the scatterplots, I come up with a very different view of the data than what is described in the paper.

      The information in the plots could be translated into a rigorous statistical analysis of estimated differences between groups given the uncertainties of the experimental design. I don't really think that analysis would be useful. I think it would have been enough to publish the plots and report your estimates of the number of active zones in RGCs during development. I don't see evidence of an additional effect.

      We appreciate the reviewer’s helpful comments throughout the review process. Mean active zone numbers per mAZ contact are presented in Figure S2D/E. We look forward to further technical and computational advances that will help us increase our data acquisition throughput and sample sizes when designing future studies. 

      Reviewer #2 (Recommendations for the authors):

      The authors should modify the title and other text to be more consistent with the data. There is no evidence that active zone clustering has any direct relationship to synaptic competition.

      We appreciate the reviewer’s helpful suggestions to ensure appropriate language around causal effects. We have modified the title to accurately reflect the results: "Eyespecific differences in active zone addition during synaptic competition in the developing visual system." We have revised the text in the abstract, introduction, and results section for Figures 4 to be consistent with the data and not imply causality of synapse clustering on segregation phenotypes.

      Reviewer #3 (Recommendations for the authors):

      Change the title.

      We appreciate the reviewer’s feedback throughout the review process. We have modified the title to accurately reflect the results: "Eye-specific differences in active zone addition during synaptic competition in the developing visual system."

    1. eLife Assessment

      This important work advances our understanding of NMDAR diversity in the brain by providing evidence into the subunit arrangement, architecture, and activation mechanism of GluN1-N2-N3A tri-NMDAR. However, the evidence supporting the conclusions provides incomplete proof for the presence and functional properties of this NMDA receptor subtype. The work will be of broad interest to neuroscientists and biophysicists.

    2. Reviewer #1 (Public review):

      Summary:

      The previous evidence for NMDARs containing N1, N2, and N3 subunits (t-NMDARs) was weak. All previous results could be explained by mixtures of di-heteromeric receptors. The authors here set out to identify t-NMDARs both in vitro and in the brain.

      Strengths:

      The single-channel recording is quite convincing because the authors could reproduce previous results in their system, but could also then add new observations. It is quite hard (if not impossible) to obtain the N1-N2A-N3A result at 100 µM Glu/Gly from a mixture, because the N1-N2A diheteromer has such a high open probability. Therefore, any idea that this might be, in fact, two receptors (GluN1-N2A and GluN1-N3A) is trivially falsified. The authors might prefer to make this argument based on the reduction of open probability, which cannot be achieved from a mixture masquerading as a single channel.

      With regard to crosslinker usage in brain tissue, these are very impressive attempts, which I applaud. The fluorescence images of the brain sections look convincing. But the bands corresponding to N2-N3 crosslinked subunits from neurons or the brain are faint. I would want more information to be convinced that these faint bands come from GluN2-N3 dimers.

      Weaknesses:

      In the first part of the paper, where the CryoEM structure is determined, it's not really clear to me the extent to which Fab binding might bias the position of the ATDs (and even then the arrangement of each subunit within the whole complex). Then, much later at the end of the results, there is a structural analysis that claims to be integrative (Figure 7) but does not obviously rely on any other data than the structures, but does mention this point about the Fabs. The results could be rearranged to make these points clearer.

      I have my biggest doubts about the crosslinking of native receptors. For the biochemistry from neurons or brain tissue, this is a very ambitious idea that has been hard to execute over the past 15-20 years. The authors use AzF for the obvious reason that this was done before in NMDARs. The constructs that have been assembled are neat. But AzF is a really bad crosslinker. The authors attribute the weak bands to subunit mobility, but the minor abundance is more likely due to the strong constraints on AzF crosslinking and its unsuitable photochemistry in general (very easily activated with room light, for example).

      There is no information at all given about the wavelength, intensity, duration of UV exposure, and how, for example, the right exposure was determined. How were the samples protected in between?

    3. Reviewer #2 (Public review):

      Summary:

      The authors purified and solved by cryo-EM a structure of tri-heteromeric GluN1/GluN2A/GluN3A NMDA receptors, whose existence has long been contentious. Using patch-clamp electrophysiology on GluN1/GluN2/GluN3A NMDARs reconstituted into liposomes, they characterized the function of this NMDAR subtype. Finally, thanks to site-targeted crosslinking using unnatural amino acid incorporation, they show that the GluN2A subunit can crosslink with the GluN3A subunit in a cellular context, both in recombinant systems (HEK cells) and neuronal cultures and in vivo.

      Strengths:

      The NMDAR GluN3 subunit is a glycine-binding subunit that was long thought to assemble into GluN1/GluN2/GluN3 tri-heteromeric receptors during development, acting as a brake for synaptic development. However, several studies based on single subunit counting (Ulbrich et al., PNAS 2008) and ex vivo/in vivo electrophysiology have challenged the existence of these tri-heteromers (see Bossi, Pizzamiglio et al., Trends Neurosci. 2023). A large part of the controversy stems from the difficulty in isolating the tri-heteromeric population from their di-heteromeric counterparts, which led to a lack of knowledge on the biophysical and pharmacological properties of putative GluN1/GluN2/GluN3 receptors. To counteract this problem, the authors used a two-step purification method - first with a strep-tag attached to the GluN3 subunit, then with a His tag attached to the GluN2 subunit - to isolate GluN1/GluN2/GluN3 tri-heteromers from GluN1/GluN2A and GluN1/GluN3 di-heteromers, and they did observe these entities in Western blot and FSEC. They solved a cryo-EM structure of this NMDAR subtype using specific FAbs to identify the GluN1 and GluN2A subunits, showing an asymmetrical, splayed architecture. Then, they reconstituted the purified receptors in lipid vesicles to perform single-channel electrophysiological recordings. Finally, in order to validate the tri-heteromeric arrangement in a cellular system, they performed photocrosslinking experiments between the GluN2A and GluN3 subunits. For this purpose, a photoactivatable unnatural amino acid (AzF) was incorporated at the bottom of GluN2A NTD, a region embedded within the receptor complex that is predicted to be in close proximity to the GluN3 subunit. This is an elegant approach to validate the existence of GluN1/GluN2/GluN3 tri-hets, since at the chosen AzF incorporation position, crosslinking between GluN2A and GluN3 is more likely to reflect interaction of subunits within the same receptor complex than between two receptors. They show crosslinking between GluN2A and GluN3 in the presence of AzF and UV light, but not if UV light or AzF were not provided, suggesting that GluN2A and GluN3 can indeed be incorporated in the same complex. In a further attempt to demonstrate the physiological relevance of these tri-heteromers, they performed the same crosslinking experiments in cultured neurons and even native brain samples. While unnatural amino acid incorporation is now a well-established technique in vitro, such an approach is very difficult to implement in vivo. The technical effort put into the validation of the presence of these tri-heteromers in vivo should thus be commended.

      Overall, all the strategies used by this paper to prove the existence of GluN1/GluN2/GluN3 tri-heteromers, and investigate their structure and function, are well-thought-out and very elegant. But the current data do not fully support the conclusions of the paper.

      Weaknesses:

      All the experiments aiming at proving the existence of GluN1/GluN2/GluN3 tri-heteromers rely on the purification of these receptors from whole cell extracts. There is therefore no proof that these receptors are expressed at the membrane and are functional. This is a limitation that has been overlooked and should be discussed in the manuscript. In addition, in the current manuscript state, each demonstration suffers from caveats that do not allow for a firm conclusion about the existence and the properties of this receptor subtype.

      (1) In Cryo-EM images of GluN1/GluN2A/GluN3A receptors, the GluN3 subunit is identified as the subunit having no Fab bound to it. How can the authors be sure that this is indeed the GluN3A subunit and not a GluN2A subunit that has not bound the Fab? Does the GluN3A subunit carry features that would allow distinguishing it independently of Fab binding? In addition, it is surprising that the authors did not incubate the tri-heteromers with a Fab against GluN3A, since Extended Figure 3 shows that such a Fab is available.

      (2) Whether the single-channel recordings reflect the activity of GluN1/GluN2/GluN3 tri-heteromers is not convincing. Indeed, currents from liposomes containing these tri-heteromers have two conductance levels that correspond to the conductances of the corresponding di-heteromers. There is therefore a need for additional proof that the measured currents do not reflect a mixture of currents from N1/2A di-heteromers on one side, and N1/3A di-heteromers on the other side. What is the purity of the N1/3A sample? Indeed, given the high open probability and high conductance of N1/2A tri-heteromers, even a small fraction of them could significantly contribute to the single-channel currents. Additionally, although the authors show no current induced by 3uM glycine alone on proteoliposomes with the N1/2A/3A prep (no stats provided, though), given the sharp dependence of N1/3A currents on glycine concentration, this control alone cannot rule out the presence of contaminant N1/3A dihets in the preparation.

      Finally, pharmacological characterization of these tri-heteromers is lacking. In vivo, the presence of tri-heteromeric GluN1/GluN2/GluN3 tri-heteromers was inferred from recordings of NMDARs activated by glutamate but with low magnesium sensitivity. What is the effect of magnesium on N1/2A/3A currents? Does APV, the classical NMDAR antagonist acting at the glutamate site, inhibit the tri-heteromers? What is the effect of CGP-78608, which inhibits GluN1/GluN2 NMDARs but potentiates GluN1/GluN3 NMDARs? Such pharmacological characterization is critical to validate that the measured currents are indeed carried by a tri-heteromeric population, and would also be very important to identify such tri-heteromers in native tissues.

      (3) Validation of GluN1/GluN2/GluN3 tri-heteromer expression by photocrosslinking: The mixture of constructions used (full-length or CTD-truncated constructs, with or without tags) is confusing, and it is difficult to track the correct molecular weight of the different constructs. In Figure 6, the band corresponding to a putative GluN3/GluN2A dimer is very weak. In addition, given the differences in molecular weights between the GluN2 subunits and GluN3, we would expect the band corresponding to a GluN2A/GluN2B to migrate differently from the GluN2A/GluN3 dimer, but all high molecular weight bands seem to be a the same level in the blot. Finally, in the source data, the blots display additional bands that were not dismissed by the authors without justification. In short, better clarification of the constructs and more careful interpretation of the blots are necessary to support the conclusions claimed by the authors.

    1. eLife Assessment

      This important study sought to investigate the role that early childhood malaria exposure plays in the development of antibody responses to unrelated pathogens and vaccine-derived antigens in Kenyan children. In this natural experiment, the authors compare antibody levels among children who have been exposed to different levels of malaria transmission by using protein microarray technology. Although the findings are of importance, the evidence remains incomplete, and the analysis would benefit from a more in-depth evaluation of potential confounders. With the appropriate analysis, the findings will be of great interest for global health, immunology, and vaccine development.

    2. Reviewer #1 (Public review):

      Summary:

      The study shows that childhood malaria can weaken the antibody response to other vaccines and infections. This suggests that early exposure to P. falciparum may have a long-lasting effect on immunity, with implications for vaccine efficacy in endemic areas.

      Strengths:

      This study stands out for its longitudinal design, the use of robust immunological techniques, and the comparison between areas with different levels of malaria exposure. Its findings reveal that early malaria can weaken the response to childhood vaccines, with important implications for public health in endemic regions.

      Weaknesses:

      One of the study's main limitations is the lack of functional data confirming the clinical impact of the low antibody levels. Furthermore, although multiple immune responses were measured, other important components, such as cellular immunity, were not assessed. Furthermore, the results may not be generalizable to other regions.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigated whether early-life malaria exposure has long-term effects on immune responses to unrelated antigens. They leveraged a natural experiment in coastal Kenya where two adjacent communities (Junju and Ngerenya) experienced divergent malaria transmission patterns after 2004. Using 15 years of longitudinal data from 123 children with weekly malaria surveillance and annual serological sampling, they measured antibody responses to multiple pathogens using a protein microarray technology and ELISA.

      Strengths:

      (1) Extensive longitudinal data collection with weekly malaria surveillance, enabling precise exposure classification.

      (2) Use of a natural experiment design that allows for causal inference about malaria's immunological effects.

      (3) Broad panel of antigens tested, demonstrating generalized rather than antigen-specific effects.

      (4) Within-cohort analysis in Ngerenya controls for geographic and environmental factors.

      (5) Validation of key findings using both serologic microarray and ELISA.

      (6) Important public health implications for vaccine strategies in malaria-endemic regions.

      Weaknesses:

      (1) Lack of participants' characteristics (socio-economic, nutritional, physical).

      (2) Somewhat limited sample size (longitudinal analysis of 123 children total), with further subdivision reducing statistical power for some analyses.

      (3) Potential confounding by unmeasured socioeconomic, nutritional, or environmental factors between communities.

      (4) Lack of ability to determine the direction of the associations found between malaria exposure and other IgG levels to unrelated pathogens.

      (5) Despite good longitudinal data, the main analysis was conducted as a cross-sectional analysis at age 10 for many comparisons, which limits the understanding of temporal dynamics.

      (6) Statistical analysis is limited to univariable comparisons without consideration for confounders or adjusting for multiple comparisons.

      (7) No mechanistic understanding of how early malaria exposure creates lasting immunosuppression.

      (8) No understanding of the clinical Implications of the reduced IgG levels observed in the area with high malaria exposure.

      Assessment of Claims:

      The data appear to support the authors' primary claims, but the strength of the evidence is limited, and the results should be interpreted with caution. Together with the currently available evidence of P. falciparum's impact on the host's immune function, this natural experiment design provides further evidence for a relationship between early malaria exposure and reduced antibody responses. The within-Ngerenya analysis controls for geographic factors and thus enhances the quality of the evidence; however, it still fails to account for the physical, nutritional, and socio-economic factors that may have driven the observed changes. Additionally, the mechanism underlying this effect remains unclear, and the clinical significance of reduced antibody levels is not established.

      Impact and Utility:

      This work has fundamental implications for understanding vaccine effectiveness in malaria-endemic regions and may contribute to informing vaccination strategies. The findings, if strengthened, would suggest that children in areas of high malaria transmission may require modified immunization approaches. The dataset provides a valuable resource for future studies of malaria's immunological legacy.

      Context:

      This study builds on prior work showing acute immunosuppressive effects of malaria but uniquely attempts to demonstrate the durability of these effects years after exposure. The natural experiment design addresses limitations of previous observational studies by providing a more controlled comparison.

    1. eLife Assessment

      This important work combines theoretical analysis with precise experimental perturbation to demonstrate that the Wnt signaling pathway is characterized by anti-resonance, or a suppression of pathway output at intermediate activation frequencies. The authors identify an anti-resonance behavior, with compelling evidence from optogenetic stimulation in multiple cell types, alongside modeling results that corroborate the phenomenon. While the demonstration of this phenomenon has yet to be extended to fully physiological situations, its clear existence within optogenetically stimulated systems shows that it is likely a significant factor that contributes to the behavior of this central signaling pathway.

    2. Reviewer #1 (Public review):

      Summary:

      This report demonstrates that the gene expression output of the Wnt pathway, when controlled precisely by a synthetic light-based input, depends substantially on the frequency of stimulation. The particular frequency-dependent trend that is observed - anti-resonance, a suppression of target gene expression at intermediate frequencies given a constant duty cycle - is a novel aspect that has not been clearly shown before for this or other signaling pathways. The paper provides both clear experimental evidence of the phenomenon with engineered cellular systems and a model-based analysis of how the pairing of rate constants in pathway activation/deactivation could result in such a trend.

      Strengths:

      This report couples in vitro experimental data with an abstracted mathematical model. Both of these approaches appear to be technically sound and to provide consistent and strong support for the main conclusion. The experimental data are particularly clear, and the demonstration that Brachyury expression is subject to anti-resonance in ESCs is particularly compelling. The modeling approach is reasonably scaled for the system at the level of detail that is needed in this case, and the hidden variable analysis provides some insight into how the anti-resonance works.

      Weaknesses:

      (1) The anti-resonance phenomenon has not been demonstrated using physiological Wnt ligands; however, I view this as only a minor weakness for an initial report of the phenomenon. The potential significance of the phenomenon for Wnt outweighs the amount of effort it would take to carry the demonstration further - testing different frequencies/duty cycles at the level of ligand stimulus using microfluidics could get quite involved, and would likely take quite some time. Adding some more discussion about how the time scales of ligand-receptor binding could play into the reduced model would further ameliorate this issue.

      (2) While the model is fully consistent with the data, it has not been validated using experimental manipulations to establish that the mechanisms of the cell system and the model are the same. There may be some ways to make such modifications, for example, using a proteasome inhibitor. An alternative would be to more explicitly mention the need to validate the model's mechanism with experiments.

      (3) I think the manuscript misses an opportunity to discuss the potential of the phenomenon in other pathways. The hedgehog pathway, for example, involves GSK3-mediated partial proteolysis of a transcription factor, which could conceivably be subject to similar behaviors, and there are certainly other examples as well.

      (4) Some aspects of the modeling and hidden variable analysis are not optimally presented in the main text, although when considered together with the Supplemental Data, there are no significant deficiencies.

    3. Reviewer #2 (Public review):

      Summary:

      By combining optogenetics with theoretical modelling, the authors identify an anti-resonance behavior in the WnT signaling pathway. This behavior is manifested as a minimal response at a certain stimulation frequency. Using an abstracted hidden variable model, the authors explain their findings by a competition of timescales. Furthermore, they experimentally show that this anti-resonance influences the cell fate decision involved in human gastrulation.

      Strengths:

      (1) This interdisciplinary study combines precise optogenetic manipulation with advanced modelling.

      (2) The results are directly tested in two different systems: HEK293T cells and H9 human embryonic stem cells.

      (3) The model is implemented based on previous literature and has two levels of detail: i) a detailed biochemical model and ii) an abstract model with a hidden parameter.

      Weaknesses:

      (1) While the experiments provide both single-cell data and population data, the model only considers population data.

      (2) Although the model captures the experimental data for TopFlash very well, the beta-Cat curves (Figure 2B) are only described qualitatively. This discrepancy is not discussed.

      Overall Assessment:

      The authors convincingly identified an anti-resonance behavior in a signaling pathway that is involved in cell fate decisions. The focus on a dynamic signal and the identification of such a behavior is important. I believe that the model approach of abstracting a complicated pathway with a hidden variable is an important tool to obtain an intuitive understanding of complicated dependencies in biology. Such a combination of precise ontogenetic manipulation with effective models will provide a new perspective on causal dependencies in signaling pathways and should not be limited only to the system that the authors study.

    1. eLife Assessment

      This fundamental study presents a new method for longitudinally tracking cells in two-photon imaging data that addresses the specific challenges of imaging neurons in the developing cortex. It provides compelling evidence demonstrating reliable longitudinal identification of neurons across the second postnatal week in mice. The study should be of interest to development neuroscientists engaged in population-level recordings using two-photon imaging.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents a compelling and innovative approach that combines Track2p neuronal tracking with advanced analytical methods to investigate early postnatal brain development. The work provides a powerful framework for exploring complex developmental processes such as the emergence of sensory representations, cognitive functions, and activity-dependent circuit formation. By enabling the tracking of the same neurons over extended developmental periods, this methodology sets the stage for mechanistic insights that were previously inaccessible.

      Strengths:

      (1) Innovative Methodology:

      The integration of Track2p with longitudinal calcium imaging offers a unique capability to follow individual neurons across critical developmental windows.

      (2) High Conceptual Impact:

      The manuscript outlines a clear path for using this approach to study foundational developmental questions, such as how early neuronal activity shapes later functional properties and network assembly.

      (3) Future Experimental Potential:

      The authors convincingly argue for the feasibility of extending this tracking into adulthood and combining it with targeted manipulations, which could significantly advance our understanding of causality in developmental processes.

      (4) Broad Applicability:

      The proposed framework can be adapted to a wide range of experimental designs and questions, making it a valuable resource for the field.

      Weaknesses:

      None major. The manuscript is conceptually strong and methodologically sound. Future studies will need to address potential technical limitations of long-term tracking, but this does not detract from the current work's significance and clarity of vision

      Comments on revisions:

      I have no further requests. I think this is an excellent manuscript

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Majnik and colleagues introduces "Track2p", a new tool designed to track neurons across imaging sessions of two-photon calcium imaging in developing mice. The method addresses the challenge of tracking cells in the growing brain of developing mice. The authors showed that "Track2p" successfully tracks hundreds of neurons in the barrel cortex across multiple days during the second postnatal week. This enabled identification of the emergence of behavioral state modulation and desynchronization of spontaneous network activity around postnatal day 11.

      Strengths

      The authors have satisfactorily addressed the majority of our questions and comments, and the revisions substantially improve the manuscript. The expansion of Track2p to accept general NumPy array inputs makes the tool more accessible to researchers using different analysis pipelines. While the absence of benchmarking standards remains a limitation across the field, the release of the ground-truth dataset is an important step forward that will allow other researchers to evaluate and compare algorithms.

      Minor point

      (1) The authors tested the robustness of the algorithm across non-consecutive days. As expected, performance drops significantly under these conditions. We agree that this limitation reflects biological constraints due to brain growth rather than shortcomings of the algorithm itself. This is relevant for researchers planning to use Track2p for longitudinal imaging or benchmarking new algorithms, and we recommend including some of this information in the Supplementary Information along with a brief discussion.

      Comments on revisions:

      We acknowledge the extended documentation for using Track2p and converting between Suite2p outputs and NumPy arrays. This addition is of great utility. We would also suggest further expanding the documentation for the NumPy array implementation, as we ran into some errors when testing this feature using NumPy arrays generated from deltaF traces, TIFF FOVs, and Cellpose masks.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript Majnik et al. developed a computational algorithm to track individual developing interneurons in the rodent cortex at postnatal stages. Considerable development in cortical networks takes place during the first postnatal weeks, however, tools to study them longitudinally at a single cell level are scarce. This paper provides a valuable approach to study both single cell dynamics across days and state-drive network changes. The authors used Gad67Cre mice together with virally introduced TdTom to track interneurons based on their anatomical location in the FOV and AAVSynGCaMP8m to follow their activity across the second postnatal week, a period during which the cortex is known to undergo marked decorrelation in spontaneous activity. Using Track2P, the authors show feasibility to track populations of neurons in the same mice capturing with their analysis previously described developmental decorrelation and uncovering stable representations of neuronal activity, coincident with the onset of spontaneous active movement. The quality of the imaging data is compelling, and the computational analysis is thorough, providing a widely applicable tool for the analysis of emerging neuronal activity in the cortex. Below are some points for the authors to consider.

      Major points

      The authors use a viral approach to label cortical interneurons. It is unclear how Track2P will perform in dense networks of excitatory cells using GCaMP transgenic mice.

      The authors used 20 neurons to generate a ground truth data set. The rational for this sample size is unclear. Figure 1 indicates capability to track ~728 neurons. A larger ground truth data set will increase the robustness of the conclusions.

      It is unclear how movement was scored in the analysis shown in Fig 5A. Was the time that the mouse spent moving scored after visual inspection of the videos? Were whisker and muscle twitches scored as movement or was movement quantified as amount of time in which the treadmill was displaced?

      The rational for binning the data analysis in early P11 is unclear. As the authors acknowledged, it is likely that the decoder captured active states from P11 onwards. Because active whisking begins around P14, it is unlikely to drive this change in network dynamics at P11. Does pupil dilation in the pups change during locomotor and resting states? Does the arousal state of the pups abruptly change at P11?

      Comments on revisions:

      The authors have addressed carefully all my comments. This is an interesting paper.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      We thank the reviewer for very enthusiastic and supportive comments on our manuscript. 

      Summary:

      This manuscript presents a compelling and innovative approach that combines Track2p neuronal tracking with advanced analytical methods to investigate early postnatal brain development. The work provides a powerful framework for exploring complex developmental processes such as the emergence of sensory representations, cognitive functions, and activity-dependent circuit formation. By enabling the tracking of the same neurons over extended developmental periods, this methodology sets the stage for mechanistic insights that were previously inaccessible.

      Strengths:

      (1) Innovative Methodology:

      The integration of Track2p with longitudinal calcium imaging offers a unique capability to follow individual neurons across critical developmental windows.

      (2) High Conceptual Impact:

      The manuscript outlines a clear path for using this approach to study foundational developmental questions, such as how early neuronal activity shapes later functional properties and network assembly.

      (3) Future Experimental Potential:

      The authors convincingly argue for the feasibility of extending this tracking into adulthood and combining it with targeted manipulations, which could significantly advance our understanding of causality in developmental processes.

      (4) Broad Applicability:

      The proposed framework can be adapted to a wide range of experimental designs and questions, making it a valuable resource for the field.

      Weaknesses:

      No major weaknesses were identified by this reviewer. The manuscript is conceptually strong and methodologically sound. Future studies will need to address potential technical limitations of long-term tracking, but this does not detract from the current work's significance and clarity of vision.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Majnik and colleagues introduces "Track2p", a new tool designed to track neurons across imaging sessions of two-photon calcium imaging in developing mice. The method addresses the challenge of tracking cells in the growing brain of developing mice. The authors showed that "Track2p" successfully tracks hundreds of neurons in the barrel cortex across multiple days during the second postnatal week. This enabled the identification of the emergence of behavioral state modulation and desynchronization of spontaneous network activity around postnatal day 11.

      Strengths:

      The manuscript is well written, and the analysis pipeline is clearly described. Moreover, the dataset used for validation is of high quality, considering the technical challenges associated with longitudinal two-photon recordings in mouse pups. The authors provide a convincing comparison of both manual annotation and "CellReg" to demonstrate the tracking performance of "Track2p". Applying this tracking algorithm, Majnik and colleagues characterized hallmark developmental changes in spontaneous network activity, highlighting the impact of longitudinal imaging approaches in developmental neuroscience. Additionally, the code is available on GitHub, along with helpful documentation, which will facilitate accessibility and usability by other researchers.

      Weaknesses:

      (1) The main critique of the "Track2p" package is that, in its current implementation, it is dependent on the outputs of "Suite2p". This limits adoption by researchers who use alternative pipelines or custom code. One potential solution would be to generalize the accepted inputs beyond the fixed format of "Suite2p", for instance, by accepting NumPy arrays (e.g., ROIs, deltaF/F traces, images, etc.) from files generated by other software. Otherwise, the tool may remain more of a useful add-on to "Suite2p" (see https://github.com/MouseLand/suite2p/issues/933) rather than a fully standalone tool.

      We thank the reviewer for this excellent suggestion. 

      We have now implemented this feature, where Track2p is now compatible with ‘raw’ NumPy arrays for the three types of inputs. For more information, please check the updated documentation: https://track2p.github.io/run_inputs_and_parameters.html#raw-npy-arrays. We have also tested this feature using a custom segmentation and trace extraction pipeline using Cellpose for segmentation.

      (2) Further benchmarking would strengthen the validation of "Track2p", particularly against "CaIMaN" (Giovannucci et al., eLife, 2019), which is widely used in the field and implements a distinct registration approach.

      This reviewer suggested  further benchmarking of Track2P.  Ideally, we would want to benchmark Track2p against the current state-of-the-art method. However, the field currently lacks consensus on which algorithm performs best, with multiple methods available including CaIMaN, SCOUT (Johnston et al. 2022), ROICaT (Nguyen et al. 2023), ROIMatchPub (recommended by Suite2p documentation and recently used by Hasegawa et al. 2024), and custom pipelines such as those described by Sun et al. 2025. The absence of systematic benchmarking studies—particularly for custom tracking pipelines—makes it impossible to identify the current state-of-the-art for comparison with Track2p. While comparing Track2p against all available methods would provide comprehensive evaluation, such an analysis falls beyond the scope of this paper.

      We selected CellReg for our primary comparison because it has been validated under similar experimental conditions—specifically, 2-photon calcium imaging in developing hippocampus between P17-P25 (Wang et al. 2024)—making it the most relevant benchmark for our developmental neocortex dataset.

      That said, to support further benchmarking in mouse neocortex (P8-P14), we will publicly release our ground truth tracking dataset.

      (3) The authors might also consider evaluating performance using non-consecutive recordings (e.g., alternate days or only three time points across the week) to demonstrate utility in other experimental designs.

      Thank you for your suggestion. We have performed a similar analysis prior to submission, but we decided against including it in the final manuscript, to keep the evaluation brief and to not confuse the reader with too many different evaluation methods. We have included the results inAuthor response images 1 and 2 below.

      To evaluate performance in experimental designs with larger time spans between recordings (>1 day) we performed additional evaluation of tracking from P8 to each of the consecutive days while omitting the intermediate days (e. g. P8 to P9, P8 to P10 … P8 to P14). The performance for the three mice from the manuscript is shown below:

      Author response image 1.

      As expected with increasing time difference between the two recordings the performance drops significantly (dropping to effectively zero for 2 out of 3 mice). This could also explain why CellReg struggles to track cells across all days, since it takes P8 as a reference and attempts to register all consecutive days to that time point before matching, instead of performing registration and matching in consecutive pairs of recordings (P8-P9, P9-P10 … P13-P14) as we do.

      Finally for one of the three mice we also performed an additional test where we asked how adding an additional recording day might rescue the P8-P14 tracking performance. This corresponds to the comment from the reviewer, answering the question if we can only perform three days of recording which additional day would give the best tracking performance. 

      Author response image 2.

      As can be seen from the plot, adding the P10 or P11 recording shows the most significant improvement to the tracking performance, however the performance is still significantly lower than when including all days (see Fig. 4). This test suggests that including a day that is slightly skewed to earlier ages might improve the performance more than simply choosing the middle day between the two extremes. This would also be consistent with the qualitative observation that the FOV seems to show more drastic day-to-day changes at earlier ages in our recording conditions.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Majnik et al. developed a computational algorithm to track individual developing interneurons in the rodent cortex at postnatal stages. Considerable development in cortical networks takes place during the first postnatal weeks; however, tools to study them longitudinally at a single-cell level are scarce. This paper provides a valuable approach to study both single-cell dynamics across days and state-driven network changes. The authors used Gad67Cre mice together with virally introduced TdTom to track interneurons based on their anatomical location in the FOV and AAVSynGCaMP8m to follow their activity across the second postnatal week, a period during which the cortex is known to undergo marked decorrelation in spontaneous activity. Using Track2P, the authors show the feasibility of tracking populations of neurons in the same mice, capturing with their analysis previously described developmental decorrelation and uncovering stable representations of neuronal activity, coincident with the onset of spontaneous active movement. The quality of the imaging data is compelling, and the computational analysis is thorough, providing a widely applicable tool for the analysis of emerging neuronal activity in the cortex. Below are some points for the authors to consider.

      We thank the reviewer for a constructive and positive evaluation of our MS. 

      Major points:

      (1) The authors used 20 neurons to generate a ground truth dataset. The rationale for this sample size is unclear. Figure 1 indicates the capability to track ~728 neurons. A larger ground truth data set will increase the robustness of the conclusions.

      We think this was a misunderstanding of our ground truth dataset analysis which included 192 and not 20 neurons. Indeed, as explained in the methods section, since manually tracking all cells would require prohibitive amounts of time, we decided to generate sparse manual annotations, only tracking a subset of all cells from the first recording day onwards. To do this, we took the first recording (s0), and we defined a grid 64 equidistant points over the FOV and, for each point, identified the closest ROI in terms of euclidean distance from the median pixel of the ROI (see Fig. S3A). We then manually tracked these 64 ROIs across subsequent days. Only neurons that were detected and tracked across all sessions were taken into account and referred to as our ground truth dataset (‘GT’ in Fig. 4). This was done for 3 mice, hence 3X64 neurons and not 20 were used to generate our GT dataset. 

      (2) It is unclear how movement was scored in the analysis shown in Figure 5A. Was the time that the mouse spent moving scored after visual inspection of the videos? Were whisker and muscle twitches scored as movement, or was movement quantified as the amount of time during which the treadmill was displaced?

      Movement was scored using a ‘motion energy’ metric as in Stringer et al. 2019 (V1) or Inácio et al. 2025 (S1). This metric takes each two consecutive frames of the videography recordings and computes the difference between them by summing up the square of pixelwise differences between the two images. We made the appropriate changes in the manuscript to further clarify this in the main text and methods in order to avoid confusion.

      Since this metric quantifies global movements, it is inherently biased to whole-body movements causing more significant changes in pixel values around the whole FOV of the camera. Slight twitches of a single limb, or the whisker pad would thus contribute much less to this metric, since these are usually slight displacements in a small region of the camera FOV. Additionally, comparing neural activity across all time points (using correlation or R<sup>2</sup>) also favours movements that last longer (such as wake movements / prolonged periods of high arousal) since each time point is treated equally.

      As we suggested in the discussion, in further analysis it would be interesting to look at the link between twitches and neural activity, but this would likely require extensive manual scoring. We could then treat movements not as continuous across all time-points, but instead using event-based analysis for example peri-movement time histograms for different types of movements at different ages, which is however outside of the scope of this study.

      (3) The rationale for binning the data analysis in early P11 is unclear. As the authors acknowledged, it is likely that the decoder captured active states from P11 onwards. Because active whisking begins around P14, it is unlikely to drive this change in network dynamics at P11. Does pupil dilation in the pups change during locomotor and resting states? Does the arousal state of the pups abruptly change at P11?

      We agree that P11 does not match any change in mouse behavior that we have been able to capture. However, arousal state in mice does change around postnatal day 11. This period marks a transition from immature, fragmented states to more organized and regulated sleep-wake patterns, along with increasing influence from neuromodulatory and sensory systems. All of these changes have been recently reviewed in Wu et al. 2024 (see also Martini et al. 2021). In addition, in the developing somatosensory system, before postnatal day 11 (P11), wake-related movements (reafference) are actively gated and blocked by the external cuneate nucleus (ECN, Tiriac et al. 2016 and all excellent recent work from the Blumberg lab). This gating prevents sensory feedback from wake movements from reaching the cortex, ensuring that only sleep-related twitches drive neural responses. However, around P11, this gating mechanism abruptly lifts, enabling sensory signals from wake movements to influence cortical processing—signaling a dramatic developmental shift from Wu et al. 2024

      Reviewer #1 (Recommendations for the authors):

      This manuscript represents a significant advancement in the field of developmental neuroscience, offering a powerful and elegant framework for longitudinal cellular tracking using the Track2p method combined with robust analytical approaches. The authors convincingly demonstrate that this integrated methodology provides an invaluable template for investigating complex developmental processes, including the emergence of sensory representations and higher cognitive functions.

      A major strength of this work is its emphasis on the power of longitudinal imaging to illuminate activity-dependent development. By tracking the same neurons over time, the authors open up new possibilities to uncover how early activity patterns shape later functional outcomes and the organization of neuronal assemblies-insights that would be inaccessible using conventional cross-sectional designs.

      Importantly, the manuscript highlights the potential for this approach to be extended even further, enabling continuous tracking into adulthood and thus offering an unprecedented window into long-term developmental trajectories. The authors also underscore the exciting opportunity to incorporate targeted perturbation experiments, allowing researchers to causally link early circuit dynamics to later outcomes.

      Given the increasing recognition that early postnatal alterations can underlie the etiology of various neurodevelopmental disorders, this work is especially timely. The methods and perspectives presented here are poised to catalyze a new generation of developmental studies that can reveal mechanistic underpinnings of both typical and atypical brain development.

      In summary, this is a technically impressive and conceptually forward-looking study that sets the stage for transformative advances in developmental neuroscience.

      Thank you for the thoughtful feedback—it's greatly appreciated!

      Reviewer #2 (Recommendations for the authors):

      Minor points:

      (1) Figure 1. Consider merging or moving to Supplemental, as its rationale is well described in the text.

      We would like to retain the current figure as we believe it provides an effective visual illustration of our rationale that will capture readers' attention and could serve as a valuable reference for others seeking to justify longitudinal tracking of the developing brain. We hope the reviewer will understand our decision.

      (2) Some axis labels and panels are difficult to read due to small font sizes (e.g. smaller panels in Figures 5-7).

      Modified, thanks 

      (3) Supplementary Figures. The order of appearance in the main text is occasionally inconsistent.

      This was modified, thanks

      (4) Line 132. Add a reference to the registration toolbox used (elastix). A brief description of the affine transformation would also be helpful, either here or in the Methods section (p. 27).

      We have added reference to Ntatsis et al. 2023 and described affine transformation in the main text (lines 133-135): 

      Firstly, we estimate the spatial transformation between s0 and s1 using affine image registration (i.e. allowing shifting, rotation, scaling and shearing, see Fig. 2B, the transformation is denoted as T).

      (5) Lines 147-151. If this method is adapted from another work, please cite the source.

      Computing the intersection over union of two ROIs for tracking is a widely established and intuitive method used across numerous studies, representing standard practice rather than requiring specific citation. We have however included the reference to the paper describing the algorithm we use to solve the linear sum assignment problem used for matching neurons across a pair of consecutive days (Crouse 2016).

      (6) Line 218. "classical" or automatic?

      We meant “classical” in the sense of widely used. 

      (7) Lines 220-231. Did the authors find significant variability of successfully tracked neurons across mice? While the data for successfully tracked cells is reported (Figure 5B), the proportions are not. Could differences in neuron dropout across days and mice affect the analysis of neuronal activity statistics?

      We thank the reviewer for raising this important point. We computed the fraction of successfully tracked cells in our dataset and found substantial variability:

      Cells detected on day 0: [607, 1849, 2190, 1988, 1316, 2138] 

      Proportion successfully tracked: [0.47, 0.20, 0.36, 0.37, 0.41, 0.19]

      Notably, the number of cells detected on the first day varies considerably (607–2138 cells). There appears to be a trend whereby datasets with fewer initially detected cells show higher tracking success rates, potentially because only highly active cells are identified in these cases.

      To draw more definitive conclusions about the proportion of active cells and tracking dropout rates, we would require activity-independent cell detection methods (such as Cellpose applied to isosbestic 830 nm fluorescence, or ideally a pan-neuronal marker in a separate channel, e.g., tdTomato). We have incorporated the tracking success proportions into the revised manuscript.

      (8) Line 260. Please briefly explain, here or in the Methods, the rationale for using data from only 3 mice (rather than all 6) for evaluating tracking performance.

      We used three mice for this analysis due to the labor-intensive nature of manually annotating 64 ROIs across several days. Given the time constraints of this manual process, we determined that three subjects would provide adequate data to reliably assess tracking performance.

      (9) Line 277. Consider clarifying or rephrasing the phrase "across progressively shorter time intervals"? Do you mean across consecutive days?

      This has been rephrased as follows: 

      Additionally, to assess tracking performance over time, we quantified the proportion of reconstructed ground truth tracks over progressively longer time intervals (first two days, first three days etc. ‘Prop. correct’ in Fig. 4C-F, see Methods). This allowed us to understand how tracking accuracy depends on the number of successive sessions, as well as at which time points the algorithm might fail to successfully track cells.

      (10) Line 306. "we also provide additional resources and documentation". Please add a reference or link.

      Done, thanks

      Track2p  

      (11) Lines 342-344. Specify that the raster plots refer to one example mouse, not the entire sample.

      Done, thanks.

      (12) Lines 996-1002. Please confirm whether only successfully tracked neurons were used to compute the Pearson correlations between all pairs.

      Yes of course, this only applies to tracked neurons as it is impossible to compute this for non-tracked pairs.

      (13) Line 1003. Add a reference to scikit-learn.

      Reference was added to: 

      Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830. 

      (14) Typos.Correct spacing between numeric values and units.

      We did not find many typos regarding spacing between the numerical value and the unit symbol (degrees and percent should not be spaced right?).

      Reviewer #3 (Recommendations for the authors):

      The font size in many of the figures is too small. For example, it is difficult to follow individual ROIs in Figure S3.

      Figure font size has been increased, thanks. In Figure S3 there might have been a misunderstanding, since the three FOV images do not correspond to the FOV of the same mouse across three days but rather to the first recording for each of the three mice used in evaluation (the ROIs can thus not be followed across images since they correspond to a different mouse). To avoid confusion we have labelled each of the FOV images with the corresponding mouse identifier (same as in Fig. 4 and 5).

    1. eLife Assessment

      This is a valuable study that explores the role of the conserved transcription factor POU4-2 in the maintenance, regeneration, and function of planarian mechanosensory neurons. The authors present convincing evidence provided by gene expression and functional studies to demonstrate that POU4-2 is required for the maintenance and regeneration of mechanosensory neurons and mechanosensory function in planarians. Furthermore, the authors identify conserved genes associated with human auditory and rheosensory neurons as potential targets of this transcription factor.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors explore the role of the conserved transcription factor POU4-2 in planarian maintenance and regeneration of mechanosensory neurons. The authors explore the role of this transcription factor and identify potential targets of this transcription factor. Importantly, many genes discovered in this work are deeply conserved, with roles in mechanosensation and hearing, indicating that planarians may be a useful model with which to study the roles of these key molecules. This work is important within the field of regenerative neurobiology, but also impactful for those studying evolution of the machinery that is important for human hearing.

      Strengths:

      The paper is rigorous and thorough, with convincing support for the conclusions of the work.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigate the role of the transcription factor Smed-pou4-2 in the maintenance, regeneration and function of mechanosensory neurons in the freshwater planarian Schmidtea mediterranea. First, they characterize the expression of pou4-2 in mechanosensory neurons during both homeostasis and regeneration, and examine how its expression is affected by the knockdown of soxB1, 2, a previously identified transcription factor essential for the maintenance and regeneration of these neurons. Second, the authors assess whether pou4-2 is functionally required for the maintenance and regeneration of mechanosensory neurons.

      Strengths:

      The study provides some new insights into the regulatory role of pou4-2 in the differentiation, maintenance, and regeneration of ciliated mechanosensory neurons in planarians.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript, the authors explore the role of the conserved transcription factor POU4-2 in planarian maintenance and regeneration of mechanosensory neurons. The authors explore the role of this transcription factor and identify potential targets of this transcription factor. Importantly, many genes discovered in this work are deeply conserved, with roles in mechanosensation and hearing, indicating that planarians may be a useful model with which to study the roles of these key molecules. This work is important within the field of regenerative neurobiology, but also impactful for those studying the evolution of the machinery that is important for human hearing. 

      Strengths: 

      The paper is rigorous and thorough, with convincing support for the conclusions of the work. 

      Weaknesses: 

      Weaknesses are relatively minor and could be addressed with additional experiments or changes in writing.

      Reviewer #2 (Public review): 

      Summary: 

      In this manuscript, the authors investigate the role of the transcription factor Smed-pou4-2 in the maintenance, regeneration, and function of mechanosensory neurons in the freshwater planarian Schmidtea mediterranea. First, they characterize the expression of pou4-2 in mechanosensory neurons during both homeostasis and regeneration, and examine how its expression is affected by the knockdown of soxB1, 2, a previously identified transcription factor essential for the maintenance and regeneration of these neurons. Second, the authors assess whether pou4-2 is functionally required for the maintenance and regeneration of mechanosensory neurons. 

      Strengths: 

      The study provides some new insights into the regulatory role of pou4-2 in the differentiation, maintenance, and regeneration of ciliated mechanosensory neurons in planarians. 

      Weaknesses: 

      The overall scope is relatively limited. The manuscript lacks clear organization, and many of the conclusions would benefit from additional experiments and more rigorous quantification to enhance their strength and impact. 

      Reviewing Editor Comments: 

      (1) Quantification of pou4-2(+) cells that express (or do not express) hmcn-1-L and/or pkd1L-2(-) is a common suggestion amongst reviewers. It is recognized that Ross et al. (2018) showed that pkd1L-2 and hmcn-1L expression is detected in separate cells by double FISH, and the analysis presented in Supplementary Figure S3 is helpful in showing that some cells expressing pou4-2 (magenta) are not labeled by the combined signal of pkd1L-2 and hmcn-1-L riboprobes (green). However, I am not sure that we can conclude that pkd1L-2 and hmcn-1-L are effectively detected when riboprobes are combined in the analysis. Therefore, quantification of labeled cells as proposed by Reviewers 1 and 2 would help.

      Combining riboprobes is a standard approach in the field, and we chose this method as a direct way to determine which cells lack expression of both genes. We agree that providing the raw quantification data would be helpful for readers, and we included this data in Supplementary File S7; the file contains the quantification information for this dFISH experiment represented in Supplementary Figure 3.

      (2) It may be helpful to comment on changes (or lack of changes) in atoh gene RNA levels in RNAseq analyses of pou4-2 animals. As mentioned by one of the reviewers, in situs that don't show signal are inconclusive in this regard. 

      We fully agree with both reviewers. Two of the planarian atonal homologs are difficult to detect and produce background signals, which we attempted and previously reported in Cowles et al. Development (2013). We conceived performing reciprocal RNAi/in situ experiments, born out of curiosity given the reported role of atonal in the pou4 cascade in other organisms. However, these exploratory experiments lacked a strong rationale for inclusion, particularly given that pou4-2 and the atonal homologs do not share expression patterns, co-expression, or differential expression in our RNA-seq dataset. Therefore, we decided to omit the atonal in situs following pou4-2 RNAi. We retained the experiments showing that knockdown of the atonal genes does not show robust effects on the mechanosensory neuron pattern, as expected. We thank the reviewing editor and reviewers for pinpointing the concern. We agree that additional experiments, such as qPCR experiments, would be needed. We reasoned that while these additional experiments could be informative, they are unlikely to alter the key conclusions of this study substantially.

      (3) There seem to be typos at bottom of Figure 10 and top of page 11 when referencing to Figure 4B (should be to 5B instead): "While mechanosensory neuronal patterned expression of Eph1 was downregulated after pou4-2 and soxB1-2 inhibition, low expression in the brain branches of the ventral cephalic ganglia persisted (Figure 4B)." 

      Thank you! We have fixed those.

      (4) Typo (page 13; kernel?): "...to test to what extent the Pou4 gene regulatory kernel is conserved among these widely divergent animals." 

      Regulatory kernels are defined as the minimal sets of interacting genes that drive developmental processes and are the core circuits within a gene regulatory network, but we recognize that this might not be as well known, so we have changed the term to “network” for clarity.

      Reviewer #1 (Recommendations for the authors): 

      (1) The authors indicate that they are interested in finding out whether POU4-2 is important in the creation of mechanosensory neurons in adulthood as well as in embryogenesis (in other words, whether the mechanism is "reused during adult tissue maintenance and regeneration"). The manuscript clearly shows that planarian POU4 -2 is important in adult neurogenesis in planarians, but there is no evidence presented to show that this is a recapitulation of embryogenesis. Is pou4-2 expressed in the planarian embryo? This might be possible to examine by ISH or through the evaluation of sequencing data that already exists in the literature. 

      We agree that these statements should be precise. We have clarified when we make comparisons to the role of Pou4 in sensory system development in other organisms versus its role in the adult planarian. We examined its expression using the existing database of embryonic gene expression. Thanks for hinting at this idea. We performed BLAST in Planosphere (Davies et al., 2017) to cross-reference our clone matching dd_Smed_v6_30562_0_1, which is identical to SMED30002016. The embryonic gene expression for SMED30002016 indicates this gene is expressed at the expected stages given prior knowledge of the timing of organ development in Schmidtea mediterranea (a positive trend begins at Stage 5, with a marked increase by Stage 6 that remains comparable to the asexual expression levels shown). We thank the reviewer for pointing out this oversight. We have incorporated this result in the paper as a Supplementary Figure and discuss how we can only speculate that it has a similar role as we detect in the adult asexual worms.

      (2) Can it be determined whether the punctate pou4-2+ cells outside of the stripes are progenitors or other neural cell types? Are there pou4-2+ neurons that are not mechanosensory cell types? Could there be other roles for POU4-2 in the neurogenesis of other cell types? It might help to show percentages of overlap in Figure 4A and discuss whether the two populations add up to 100% of cells. 

      These are good questions that arise in part from other statements that need clarification in the text (pointed out by Reviewer 2). We think some of the dorsal pou4-2<sup>+</sup> might represent progenitor cells undergoing terminal differentiation (see Supplementary Figure 4). We attempted BrdU pulse chase experiments but were not successful in consistently detecting pou4-2 at sufficient levels with our protocol. In response to this helpful comment, we have included this question as a future direction in the revised Discussion. Finally, we have edited our description of the expression pattern. We already pointed out that there are other cells on the ventral side that are not affected when soxB1-2 is knocked down. We attempted to resolve the potential identity of those cells working with existing scRNA-seq data in collaboration with colleagues, but their low abundance made it difficult to distinguish other populations. While we acknowledge this interesting possibility, we have chosen to focus this report on the role of pou4-2 downstream of soxB1-2, as this represents the most well-supported aspect of the dataset and was positively highlighted by both the reviewer and editor.

      (3) The authors discuss many genes from their analysis that play conserved roles in mechanosensation and hearing. Were there any conserved genes that came up in the analysis of pou4-2(RNAi) planarians that have not yet been studied in human hearing and neurodevelopment? I am wondering the extent to which planarians could be used as a discovery system for mechanosensory neuron function and development, and discussion of this point might increase the impact of this paper or provide critical rationale for expanding work on planarian mechanosensation. 

      Indeed, we agree that planarians could be used to identify conserved genes with roles in mechanosensation and have included this point in the Discussion. In this study, we have focused on demonstrating the conservation of gene regulation. While this study was initially based on a graduate thesis project, we have since generated a more comprehensive dataset from isolated heads, which we are currently analyzing. This has been emphasized in the revised Discussion.

      Minor: 

      (1) For Figure 6E, the authors could consider showing data along a negative axis to indicate a decrease in length in response to vibration and to more clearly show that this decrease doesn't occur as strongly after pou4-2(RNAi). 

      We displayed this behavior as the percent change, as this is a standard way to represent this data. As the percent change is a positive value, we represent the data as these positive values.

      (2) The authors should consider quantifying the decrease of pou4-2 mRNA after atonal(RNAi) conditions, either by RT-qPCR or cell quantification. Visually, the signal in the stripes after atoh8-2(RNAi) seems lower, particularly in the tail. The punctate pattern outside the stripes may also be decreased after atoh8-1(RNAi). But quantification might strengthen the argument. 

      We agree with the reviewer and acknowledge that we should have been more cautious in interpreting these results. Those two genes are difficult to detect and did not show specific patterns in Cowles et al. (2013). The reviewer is correct that additional experiments are necessary before reaching conclusions, but we do not think as discussed earlier we do not think new experiments would provide insights for the major conclusions. These experiments were exploratory in nature and tangential to our main conclusions, especially in the absence of reciprocal evidence (e.g., shared expression patterns, co-expression, or differential expression in our RNA-seq data. Therefore, we decided to eliminate the atonal in situs following pou4-2 RNAi.

      Reviewer #2 (Recommendations for the authors): 

      A. Expression of pou4-2 in ciliated mechanosensory neurons: 

      (1) The conclusion that pou4-2 is expressed in ciliated mechanosensory neurons is primarily based on co-expression analysis using a published single-cell dataset. Although the authors later show that a subset of pou4-2 cells also express pkd1L-2 (Figure 4A), a known marker of ciliated mechanosensory neurons, this finding is not properly quantified. I recommend moving Figure 4A to earlier in the manuscript (e.g., to Figure 2) and expanding the analysis to include additional known markers of this cell type. Proper quantification of the extent of co-localization is necessary to support the claim robustly. 

      As pointed out by the reviewer, there is substantive evidence from our lab and other reports. King et al. also showed pou4-2 and pkd1L-2 ‘regulation’ by their scRNA-seq data, and this function is conserved in the acoel Hofstenia miamia (Hulett et al., PNAS 2024 ). Our analysis shows convincing co-localization by scRNA-seq and expression of soxB1-2 and neural markers in the respective populations. Furthermore, we included colocalization of pou4-2 with mechanosensory genes using fluorescence in situ hybridization (Figure 3B, Supplementary Figure 4, and Supplementary File S7). We are confident the data conclusively show pou4-2 regulates pkd1L-2 expression in a subset of mechanosensory neurons. Given the strength of existing observations and previously published data, we believe that additional staining experiments are not essential to support this conclusion. 

      (2) There appears to be a conceptual inconsistency in the interpretation of pou4-2 expression dynamics. On one hand, the authors suggest that delayed pou4-2 expression indicates a role in late-stage differentiation (p.6). On the other hand, they propose that pou4-2 may be expressed in undifferentiated progenitors to initiate downstream transcriptional programs (p.8). These interpretations should be reconciled. Additionally, claims regarding pou4-2 expression in progenitor populations should be supported by co-localization with established stem cell or progenitor markers, rather than inferred from signal intensity alone. 

      This is an excellent point, and we agree with the reviewer that this section requires editing. As described in response to Reviewer 1, we attempted BrdU pulse chase experiments but were not successful in consistently detecting pou4-2 at sufficient levels with our protocol. Furthermore, we could not obtain strong signals in double labeling experiments in pou4-2 in situs combined with piwi-1 or PIWI-1 antibodies. We will include those experiments as a future direction and amend our conclusions accordingly.

      (3) The expression pattern shown in Figure 1B raises questions about the precise anatomical localization of pou4-2 cells. It is unclear whether these cells reside in the subepidermal plexus or the deeper submuscular plexus, which represent distinct neuronal layers (Ross et al., 2017). The observed signals near the ventral nerve cords could suggest submuscular localization. To clarify this, higher-resolution imaging and co-staining with region-specific neural markers are recommended. 

      In Ross et al. (2018), we showed that the pkd1L-2<sup>+</sup> cells are located submuscularly. The pkd1L-2 cells express pou4-2, thus the pou4-2<sup>+</sup> cells are located in the same location. Based on co-expression data and co-expression with PKD genes, we are confident it is submuscular.

      B. The functional requirements of pou4-2 in the maintenance of mechanosensory neurons: 

      (1) To evaluate the functional role of pou4-2 in maintaining mechanosensory neurons, the authors performed whole-animal RNA-seq on pou4-2(RNAi) and control animals, identifying a significant downregulation of genes associated with mechanosensory neuron expression. However, the presentation of these findings is fragmented across Figures 3, 4, and 5. I recommend consolidating the RNA-seq results (Figure 3) and the subsequent validation of downregulated genes (Figures 4 and 5) into a single, cohesive figure. This would improve the logical flow and clarity of the manuscript. 

      As suggested by the reviewer, we have combined Figures 3 and 4 (new Figure 3), which we believe improves the flow. We decided to keep Figure 5 (new Figure 4) as a standalone because it focuses on the characterization of new genes revealed by RNAseq and scRNA-seq data mining that were not previously reported in Ross et al. 2018 and

      2024.

      (2) In pou4-2(RNAi) animals, pkd1L-2 expression appears to be entirely lost, while hmcn-1-L shows faint expression in scattered peripheral regions. The authors suggest that an extended RNAi treatment might be necessary to fully eliminate hmcn-1-L expression. However, an alternative explanation is that pou4-2 is not essential for maintaining all hmcn-1-L cells, particularly if pou4-2 expression does not fully overlap with that of hmcn-1-L. This possibility should be acknowledged and discussed. 

      We agree and have acknowledged this point in the revised text.

      (3) On page 9, the section title claims that "Smed-pou4-2 regulates genes involved in ciliated cell structure organization, cell adhesion, and nervous system development." While some differentially expressed genes are indeed annotated with these functions based on homology, the manuscript does not provide experimental evidence supporting their roles in these biological processes in planarians. The title should be revised to avoid overstatement, and the limitations of extrapolating a function solely from gene annotation should be acknowledged. 

      Excellent point. We have edited the text to indicate that the genes were annotated or implicated.

      (4) The cilia staining presented in Figure 6B to support the claim that pou4-2 is required for ciliated cell structure organization is unconvincing. Improved imaging and more targeted analysis (e.g., co-labeling with mechanosensory markers) are needed to support this conclusion. 

      We have addressed this concern by adjusting the language to be more precise and indicate that the stereotypical banded pattern is disrupted with decreased cilia labeling along the dorsal ciliated stripe. Indeed, our conclusion overstated the observations made with the staining and imaging resolution. Thank you.

      C. The functional requirements of pou4-2 in the regeneration of mechanosensory neurons: 

      To evaluate the role of pou4-2 in the regeneration of mechanosensory neurons, the authors performed amputations on pou4-2(RNAi) and control(RNAi) animals and assessed the expression of mechanosensory markers (pkd1L-2, hmcn-1-L) alongside a functional assay. However, the results shown in Figure 4B indicate the presence of numerous pkd1L-2 and hmcn-1-L cells in the blastema of pou4-2(RNAi) animals. This observation raises the possibility that pou4-2 may not be essential for the regeneration of these mechanosensory neurons. The authors should address this alternative interpretation. 

      Our interpretation is that there were very few cells expressing the markers compared to controls. The pattern was predominantly lost, which is consistent with other experiments shown in the paper. However, we have added the additional caveat suggested by the reviewer.

      Minor points: 

      (1) On p.8, the authors wrote "every 12 hours post-irradiation". However, this is not consistent with the figure, which only shows 0, 3, 4, 4.5, 5, and 5.5 dpi. 

      We corrected this. Thank you for catching the mistake!

      (2) On p.12, the authors wrote "Analysis of pou4-2 RNAi data revealed differentially expressed genes with known roles in mechanosensory functions, such as loxhd-1, cdh23, and myo7a. Mutations in these genes can cause a loss of mechanosensation/transduction". This is misleading because, to my knowledge, the role of these genes in planarians is unknown. If the authors meant other model systems, they should clearly state this in the text and include proper references. 

      The reviewer is correct that we are referencing findings from other organisms. We have clarified this point in the revised text. The appropriate references were included and cited in the first version.

      (3) On p.7, the authors wrote, "conversely, the expression of atonal genes was unaffected in pou4-2 RNAi-treated regenerates (Supplementary Figure S2B)". However, it is unclear whether the Atoh8-1 and Atoh8-2 signals are real, as the quality of the in situ results is too low to distinguish between real signals and background noise/non-specific staining. 

      This valid concern was addressed in our response to Reviewer 1. We have adjusted the figure and the text accordingly.

      (4) On p.6 the authors wrote "pinpointed time points wherein the pou4-2 transcripts were robustly downregulated". However, the current version of the manuscript does not provide data explaining why Pou4-2 transcripts are robustly downregulated on day 12. 

      Yes, we determined the appropriate time points using qPCR for all sample extractions. As an example, see the figure for qPCR validation at day 12 showing that pou4-2 and pkd1L2 are down.

      Author response image 1.

      In this graph, samples labeled “G” represent four biological controls of gfp(RNAi) control animals, and samples labeled “P” represent four biological controls of pou4-2(RNAi)animals at day 12 in the RNAi protocol.

      (5) On p.13, the authors wrote "collecting RNA from how animals." Is this a typo? 

      Thanks for catching the typo. It should read “whole” animals. We have corrected this.

      (6) On p.14, the authors wrote "but the expression patterns of planarian atonal genes indicated that they represent completely different cell populations from pou4-2-regulated mechanosensory neurons". However, this is unclear from the images, as the in situ staining of Atoh8-1 and Atoh82 are potentially failed stainings. 

      We agree. We have edited accordingly.

    1. eLife Assessment

      This valuable manuscript presents an open-source and low-cost acoustic system for quantifying biting and chewing in mice. The approach is carefully validated against human observers, demonstrating strong methodological reliability and enabling high-resolution analysis of feeding microstructure. The tool has broad relevance for studies of appetite circuits and pharmacological interventions. A significant contribution is the identification of previously unrecognized "meal-related" neurons in the lateral hypothalamus, providing novel biological insight into food consumption. While the support for the methodological advances is compelling and robust, some circuit-level conclusions are preliminary or incomplete, relying on small pilot samples and manual classification, and should be interpreted with caution. This paper will be of interest to those interested in ingestive behavior and/or hypothalamus.

    2. Reviewer #1 (Public review):

      This is an interesting and valuable paper by Gil-Lievana, Arroyo et al. that presents an open-source method (the "Crunchometer") for quantifying biting and chewing behavior in mice using audio detection. The work addresses an important and unmet need in the field: quantitative measures of feeding behavior with solid foods, since most prior approaches have been limited to liquids. The authors make a clear and compelling case for why this problem is important, and I fully agree with their motivation.

      The system is carefully validated against human-scored video data and is shown to be at least as accurate, and in some cases more accurate, than human observers. This is a major strength of the study. I also particularly appreciate the demonstration of the technology in the context of LHA circuitry, which nicely illustrates its utility and importance for mechanistic studies of feeding. I also appreciate the ability to readily time-lock neural data to individual crunches. Overall, the manuscript is well-executed and represents a useful contribution to the field.

      The comments I have are largely minor and should be straightforward to address:

      (1) The authors should report sample sizes for all mouse cohorts, either alongside the statistics or in the figure legends for mean data.

      (2) Clarification is needed as to whether crunch detection fidelity is influenced by the hardness or softness of the food. The focus here is on standard pellets, with some additional high-fat pellet data, but it would be useful to know how generalizable the method is across different textures.

      (3) The authors should comment on how susceptible the Crunchometer is to background noise. For example, how well does it perform in the presence of white noise, experimenter movement, or other task-related sounds?

      (4) Chemogenetic activation of LHA GABAergic neurons is used. DREADD-based activation may strongly drive these neurons in a way that is not directly comparable to optogenetic or more physiological manipulations. While I do not think additional experiments are required, it would strengthen the discussion to briefly acknowledge this limitation.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript introduces the Crunchometer, a low-cost, open-source acoustic platform for monitoring the microstructure of solid food intake in mice. The Crunchometer is designed to overcome the limitations of existing methods for studying feeding behavior in rodents. The goal was to provide a tool that could precisely capture the microstructure of solid food intake, something often overlooked in favor of liquid-based assays, while being affordable, scalable, and compatible with neural recording techniques. By doing so, the authors aimed to enable detailed analysis of how physiological states, drugs, and specific neural circuits shape naturalistic feeding behaviors.

      Strengths:

      The study's strengths lie in its clear innovation, methodological rigor in validation against human annotation, and demonstration of broad utility across behavioral and neuroscience paradigms. The approach addresses a significant methodological gap in the field by moving beyond liquid-based feeding assays and provides an accessible tool for precisely dissecting ingestive behavior. The system is validated across multiple contexts, including physiological state (fed vs. fasted), pharmacological manipulation (semaglutide), and circuit-level interventions (chemogenetic activation of LH neurons), and is further shown to integrate seamlessly with both electrophysiology and calcium imaging.

      (1) Introduces a low-cost, open-source acoustic tool for measuring solid food intake, filling a critical gap left by expensive and proprietary systems.

      (2) Makes the method easily adoptable across labs with detailed setup instructions and shared benchmark datasets.

      (3) Provides high temporal precision for detecting bite events compared to human observers.

      (4) Successfully distinguishes feeding microstructure (bites, bouts, IBIs, gnawing vs. consumption) with greater objectivity than manual annotation.

      (5) Demonstrates compatibility with electrophysiology and calcium imaging, enabling fine-scale alignment of neural activity with feeding behavior.

      (6) Effectively discriminates between fed vs. fasted states, validating physiological sensitivity.

      (7) Captures the pharmacological effects of semaglutide, although this is really just reduced feeding and associated readouts (bouts, latency, etc).

      (8) Has potential to distinguish consummatory vs. non-consummatory behaviors (e.g., food spillage, gnawing); however, the current SVM model struggles to separate biting from gnawing due to similar acoustic profiles, and manual validation is still required.

      (9) Provides potential for closed-loop experiments.

      Weaknesses:

      Several limitations temper the strength of the conclusions: the supervised classifier still requires manual correction for gnawing, generalizability across different setups is limited, and the neuroscience findings, particularly calcium imaging of GABAergic and glutamatergic neurons, are based on small pilot samples. These issues do not undermine the value of the tool, but mean that the neural circuit findings should be interpreted as preliminary.

      (1) Some neuroscience findings (calcium imaging of GABAergic vs. glutamatergic neurons) are based on small pilot samples (n=2 mice per condition), limiting generalizability.

      (2) Chemogenetic and pharmacological experiments used small cohorts, raising statistical power concerns.

      (3) Correlation with actual food intake is modest and sometimes less accurate than human observers.

      (4) Sensitive to hoarding behavior, which can reduce detection accuracy and requires manual correction for misclassifications (e.g., tail movements, non-food noises). However, these limitations are discussed and not ignored.

      Conclusion:

      Overall, this is an exciting and impactful methodological advance that will likely be widely adopted in the field. I recommend minor revisions to clarify the limits of classifier generalizability, better contextualize the small-sample neuroscience findings as pilot data, and discuss future directions (e.g., real-time closed-loop applications).

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript provides detailed information on the construction of open-source systems to monitor ingestive behavior with low-cost equipment. Overall, this is a welcome addition to the arsenal of equipment that could be used to make measurements. The authors show interesting applications with data that reveal important neurophysiological properties of neurons in the lateral hypothalamus. The identification of previously unknown "meal-related" neurons in the LH highlights the utility of the device and is a novel insight that should spark further investigation on the LH. This manuscript and videos provide a wealth of useful information that should be a must-read for anyone in the ingestive behavior or hypothalamus fields.

      A scholarly introduction to the history and utility of various ways feeding is measured in rodents is provided. One point - the microstructure of eating solid food - has been studied extensively (for one of many studies, see https://doi.org/10.1371/journal.pone.0246569 ). However, I agree that the crunchometer will allow for more people to access recordings during food intake and temporally lock consummatory behavior to neural activity.

      Questions on results:

      (1) It is unclear why 10% sucrose solution was used as a liquid instead of water, given that the study is focusing on the solid food source.

      (2) It is unclear how essential the human verification is in the pipeline - results for Figure 1 keep referring to the verification as essential. Is that dispensable once the ML algorithms have been trained?

      (3) The ability to extrapolate food quantity consumed is limited, with high variability. This limitation does not undercut the utility of the crunchometer, but should be highlighted as one of the parameters that are not suitable for this system. This limitation should be added to the limitations section.

      (4) The ability to discriminate between gnawing and consummatory behavior is a strength (Figure 5), and these findings are important. However, it is unclear what can be made of mice that have 'gnawing' behavior in the fasted state (like in Figure 3). It seems they would need to be eliminated from the analysis with this tool?

      (5) Why is there a post-semaglutide fed group and not a fasted group in Figure 4? It seems both would have been interesting, as one could expect an effect on feeding even 24h after semaglutide treatment. This would help parse the preference better because the animals eat such a small amount on semaglutide, that it is hard to compare to the fasted condition with saline treatment.

      (6) The identification of 'meal-related' neurons in the LH is another strength of the manuscript. Although there is currently insufficient data, could similar recordings be used to give a neurophysiological definition of a 'meal' duration/size? Typically, these were somewhat arbitrarily defined behaviorally. Having a neural correlate to a 'meal' would be a powerful tool for understanding how meals are involved in overall caloric intake.

      (7) The conclusion in the title of Figure 8 is premature, given the pilot nature and small number of neurons and mice sampled.

      Conclusion:

      Overall, this report on the Crunchometer is well done and provides a valuable tool for all who study food intake and the behaviors around food intake. Clarification or answers to the points above will only further the utility and understanding of the tool for the research community. I am excited to see the future utility of this tool in emerging research.

    1. eLife Assessment

      This paper is an important overview of the currently published literature on low-intensity focused ultrasound stimulation (TUS) in humans, providing a meta-analysis of this literature that explores which stimulation parameters might predict the directionality of the physiological stimulation effects. The overall synthesis is convincing. The database proposed by the paper has the potential to become a key community resource if carefully curated and developed.

    2. Reviewer #1 (Public review):

      This paper is a relevant overview of the currently published literature on low-intensity focused ultrasound stimulation (TUS) in humans, with a meta-analysis of this literature that explores which stimulation parameters might predict the directionality of the physiological stimulation effects.

      The pool of papers to draw from is small, which is not surprising given the nascent technology. It seems, nevertheless, relevant to summarise the current field in the way done here, not least to mitigate and prevent some of the mistakes that other non-invasive brain stimulation techniques have suffered from, most notably the theory- and data free permutation of the parameter space.

      A database summarising the literature and allowing for quantitative assessment of these studies is a key contribution of the paper. If curated well, it can become a valuable community resource.

      Comments on revisions:

      The paper is much improved. There remain a few caveats the authors may want to address.

      I'm not going to dwell on this if the authors don't agree, but remain critical about the inclusion of TPS in the discussion. It's comparing apples and oranges, and unless there's a personal interest the authors have in TPS, it remains puzzling why it is included in the first place. As per my previous review, the literature on TPS, and especially the main example cited, has been highly criticised, including national patient and medical associations. A mere disclaimer that more work is needed isn't enough, in this reviewer's opinion - I simply don't understand why the authors go out on a limb here when the rest of the paper is done so well and thoroughly.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      This paper is a relevant overview of the currently published literature on lowintensity focused ultrasound stimulation (TUS) in humans, with a meta-analysis of this literature that explores which stimulation parameters might predict the directionality of the physiological stimulation effects.

      The pool of papers to draw from is small, which is not surprising given the nascent technology. It seems nevertheless relevant to summarize the current field in the way done here, not least to mitigate and prevent some of the mistakes that other non-invasive brain stimulation techniques have suffered from, most notably the theory- and data-free permutation of the parameter space.

      The meta-analysis concludes that there are, at best, weak trends toward specific parameters predicting the direction of the stimulation effects. The data have been incorporated into an open database that will ideally continue to be populated by the community and thereby become a helpful resource as the field moves forward.

      Strengths:

      The current state of human TUS is concisely and well summarized. The methods of the meta-analysis are appropriate. The database is a valuable resource.

      We thank the reviewer for their positive assessment of the revised manuscript and the potential importance of the resource to the TUS community. 

      Suggestions:

      The paper remains lengthy and somewhat unfocused, to the detriment of readability. One can understand that the authors wish to include as much information as possible, but this reviewer is sceptical that this will aid the use of the databank, or help broaden the readership. For one, there is a good chunk of repetition throughout. The intro is also somewhat oscillating between TMS, tDCS and TUS. While the former two help contextualizing the issue, it doesn't seem necessary. In the section on clinical applications of TUs and possible outcomes of TUS, there's an imbalance of the content across examples. That's in part because of the difference in knowledge base but some sections could probably be shortened, eg stroke. In any case, the authors may want to consider whether it is worth making some additional effort in pruning the paper

      We thank the reviewer for these suggestions. We have checked for redundancy and that the clinical review section is more balanced, although some of the sections have more TUS studies than others, therefore some imbalance is unavoidable. As some examples, we have condensed the “Stroke and neuroprotection in brain injury” section (lines 624-647). This helps to improve the clarity and readability of the manuscript.

      The terms or concept of enhancement and suppression warrant a clearer definition and usage. In most cases, the authors refer to E/S of neural activity. Perhaps using terms such as "neural enhancement" etc helps distinguish these from eg behavioural or clinical effects. Crucially, how one maps onto the other is not clear. But in any case, a clear statement that the changes outlined on lines 277ff do not

      We thank the reviewer for this point and agree that it is important to distinguish neural E/S, as we had intended, from behavioral effects. In the first instance and in several places we add ‘neural’ before enhancement/suppression.  Also see Lines 276-279: Probable net neural enhancement versus suppression was characterised as follows. Note that our use of the terms enhancement and suppression refers exclusively to the increase or decrease of neural activity, respectively, as measured by, neurophysiological methods (EEG-ERPs, BOLD fMRI, etc.) and does not imply equivalent changes in behavioural responses 

      Please see also lines 108-116.

      Re tb-TUS (lines 382ff), it is worth acknowledging here that independent replication is very limited (eg Bao et al 2024; Fong et al bioRxiv 2024) and seems to indicate rather different effects

      We have updated this section by referencing Bao et al. and Fong et al., as examples of the limited independent replication of tbTUS results. Please see lines 392-396. “However, independent replication of these findings remains limited. For example, Bao, found reduced motor cortex excitability – measured as decreased TMS-MEP amplitude in M1 -- that lasted up to 30 minutes post-sonication (Bao et al., 2024). Whereas Fong reported no significant effects between tbTUS and sham conditions in M1 excitability (Fong et al., 2024).”

      The comparison with TPS is troublesome. For one, that original study was incredibly poorly controlled and designed. Cherry-picking individual (badly conducted) proof-of-principle studies doesn't seem a great way to go about as one can find a match for any desired use or outcome. Moreover, other than the concept of "pulsed" stimulation, it is not clear why that original study would motivate the use of TUS in the way the authors propose; both types of stimulation act in very different ways (if TPS "acts" at all). But surely the cited TPS study does not "demonstrate the capability for TUS for pre-operative cognitive mapping". As an aside, why the authors feel the need to state the "potential for TPS... to enhance cognitive function" is unclear, but it is certainly a non-sequitur. This review feels quite strongly that simplistic analogies such as the one here are unnecessary and misleading, and don't reflect the thoughtful discussion of the rest of the paper. In the other clinical examples, the authors build their suggestions on other TUS studies, which seems more sensible.

      This is an excellent point, and we have removed that statement replacing it with: “However, TPS effects studies remain highly limited and would require further study and comparison to effects with other TUS protocols.”. Please see lines 561-562. We thank the reviewer for the supportive comments on the rest of the review.

    1. eLife Assessment

      This important study addresses a topic that is frequently discussed in the literature but is under-assessed, namely correlations among genome size, repeat content, and pathogenicity in fungi. Contrary to previous assertions, the authors found that repeat content is not associated with pathogenicity. Rather, pathogenic lifestyle was found to be better explained by the number of protein-coding genes, with other genomic features associated with insect association status. The results are considered solid, although there remain concerns about potential biases stemming from the underlying data quality of the analyzed genomes.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript "Lifestyles shape genome size and gene content in fungal pathogens" by Fijarczyk et al. presents a comprehensive analyses of a large dataset of fungal genomes to investigate what genomic features correlate with pathogenicity and insect associations. The authors focus on a single class of fungi, due to the diversity of life styles and availability of genomes. They analyze a set of 12 genomic features for correlations with either pathogenicity or insect association and find that, contrary to previous assertions, repeat content does not associate with pathogenicity. They discover that the number of protein coding genes, including total size of non-repetitive DNA does correlate with pathogenicity. However, unique features are associated to insect associations. This work represents an important contribution to the attempts to understand what features of genomic architecture impact the evolution of pathogenicity in fungi.

      Strengths:

      The statistical methods appear to be properly employed and analyses thoroughly conducted. The size of the dataset is impressive and likely makes the conclusions robust. The manuscript is well written and the information, while dense, is generally presented in a clear manner.

      Weaknesses:

      My main concerns all involve the genomic data, how they were annotated, and the biases this could impart to the downstream analyses. The three main features I'm concerned with are sequencing technology, gene annotation, and repeat annotation. The authors have done an excellent investigation into these issues, but these show concerning trends, and my concerns are not as assuaged as the authors.

      The collection of genomes is diverse and includes assemblies generated from multiple sequencing technologies including both short- and long-read technologies. From the number of scaffolds its clear that the quality of the assemblies varies dramatically, even within categories of long- and short-read. This is going to impact many of the values important for this study, as the authors show.

      I have considerable worries that the gene annotation methods could impart biases that significantly effect the main conclusions. Only 5 reference training sets were used for the Sordariomycetes and these are unequally distributed across the phylogeny. Augusts obviously performed less than ideally, as the authors observe in their extended analysis. While the authors are not concerned about phylogenetic distance from the training species, due to prevailing trends, I am not as convinced. In figure S12, the Augustus features appear to have considerably more variation in values for the H2 set and possible the microascales. It is unclear how this would effect the conclusions in this study.

      Unfortunately, the genomes available from NCBI will vary greatly in the quality of their repeat masking. While some will have been masked using custom libraries generated with software like Repeatmodeler, others will probably have been masked with public databases like repbase. As public databases are again biased towards certain species (Fusarium is well represented in repbase for example), this could have significant impacts on estimating repeat content. Additionally, even custom libraries can be problematic as some software (like RepeatModeler) will included multicopy host genes leading to bona fide genes being masked if proper filtering is not employed. A more consistent repeat masking pipeline would add to the robustness of the conclusions. The authors show that there is a significant bias in their set.

      To a lesser degree I wonder what impact the use of representative genomes for a species has on the analyses. Some species vary greatly in genome size, repeat content and architecture among strains. I understand that it is difficult to address in this type of analysis, but it could be discussed.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors report on the genomic correlates of the transition to the pathogenic lifestyle in Sordariomycetes. The pathogenic lifestyle was found to be better explained by the number of genes, and in particular effectors and tRNAs, but this was modulated by the type of interacting host (insect or not insect) and the ability to be vectored by insects.

      Strengths:

      The main strengths of this study lie in (i) the size of the dataset, and the potentially high number of lifestyle transitions in Sordariomycetes, (ii) the quality of the analyses and the quality of the presentation of the results, (iii) the importance of the authors' findings.

      Weaknesses:

      The weakness is a common issue in most comparative genomics studies in fungi, but it remains important and valid to highlight it. Defining lifestyles is complex because many fungi go through different lifestyles during their life cycles (for instance, symbiotic phases interspersed with saprotrophic phases). In many fungi, the lifestyle referenced in the literature is merely the sampling substrate (such as wood or dung), which does not necessarily mean that this substrate is a key part of the life cycle. The authors discuss this issue, but they do not eliminate the underlying uncertainties.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The manuscript "Lifestyles shape genome size and gene content in fungal pathogens" by Fijarczyk et al. presents a comprehensive analysis of a large dataset of fungal genomes to investigate what genomic features correlate with pathogenicity and insect associations. The authors focus on a single class of fungi, due to the diversity of lifestyles and availability of genomes. They analyze a set of 12 genomic features for correlations with either pathogenicity or insect association and find that, contrary to previous assertions, repeat content does not associate with pathogenicity. They discover that the number of proteincoding genes, including the total size of non-repetitive DNA does correlate with pathogenicity. However, unique features are associated with insect associations. This work represents an important contribution to the attempts to understand what features of genomic architecture impact the evolution of pathogenicity in fungi.

      Strengths:

      The statistical methods appear to be properly employed and analyses thoroughly conducted. The manuscript is well written and the information, while dense, is generally presented in a clear manner.

      Weaknesses:

      My main concerns all involve the genomic data, how they were annotated, and the biases this could impart to the downstream analyses. The three main features I'm concerned with are sequencing technology, gene annotation, and repeat annotation.

      We thank the reviewer for all the comments. We are aware that the genome assemblies are of heterogeneous quality since they come from many sources. The goal of this study was to make the best use of the existing assemblies, with the assumption that noise introduced by the heterogeneity of sequencing methods should be overcome by the robustness of evolutionary trends and the breadth and number of analyzed assemblies. Therefore, at worst, we would expect a decrease in the power to detect existing trends. It is important to note that the only way to confidently remove all potential biases would be to sequence and analyze all species in the same way; this would require a complete study and is beyond the scope of the work presented here. Nevertheless some biases could affect the results in a negative way, eg. is if they affect fungal lifestyles differently. We therefore made an attempt to explore the impact of sequencing technology, gene and repeat annotation approach among genomes of different fungal lifestyles. Details are described in Supplementary Results and below. Overall, even though the assembly size and annotations conducted with Augustus can sometimes vary compared to annotations from other resources, such as JGI Mycocosm, we do not observe a bias associated with fungal lifestyles. Comparison of annotations conducted with Augustus and JGI Mycocosm dataset revealed variation in gene-related features that reflect biological differences rather than issues with annotation.  

      The collection of genomes is diverse and includes assemblies generated from multiple sequencing technologies including both short- and long-read technologies. Not only has the impact of the sequencing method not been evaluated, but the technology is not even listed in Table S1. From the number of scaffolds it is clear that the quality of the assemblies varies dramatically. This is going to impact many of the values important for this study, including genome size, repeat content, and gene number.

      We have now added sequencing technology in Table S1 as it was reported in NCBI. We evaluated the impact of long-read (Nanopore, PacBio, Sanger) vs short-read assemblies in Supplementary Results. In short, the proportion of different lifestyles (pathogenic vs. nonpathogenic, IA vs non-IA) were the same for short- and long-read assemblies. Indeed, longread assemblies were longer, had a higher fraction of repeats and less genes on average, but the differences between pathogenic vs. non-pathogenic (or IA vs non-IA) species were in the same direction for two sequencing technologies and in line with our results. There were some discrepancies, eg. mean intron length was longer for pathogens with long-read assemblies, but slightly shorter on average for short-read assemblies (and to lesser extent GC and pseudo tRNA count), which could explain weaker or mixed results in our study for these features.

      Additionally, since some filtering was employed for small contigs, this could also bias the results.

      The reason behind setting the lower contig length threshold was the fact that assemblies submitted to NCBI have varying lower-length thresholds. This is because assemblers do not output contigs above a certain length, and this threshold can be manipulated by the user. Setting a common min contig length was meant to remove this variation, knowing that any length cut-off will have a larger effect on short-read based assemblies than long-read-based assemblies. Notably, genome assemblies of corresponding species in JGI Mycocosm have a minimum contig length of 865 bp, not much lower than in our dataset. Importantly, in a response to a comment of previous reviewer, repeat content was recalculated on raw assembly lengths instead of on filtered assembly length. 

      I have considerable worries that the gene annotation methods could impart biases that significantly affect the main conclusions. Only 5 reference training sets were used for the Sordariomycetes and these are unequally distributed across the phylogeny. Augusts obviously performed less than ideally, as the authors reported that it under-annotated the genomes by 10%. I suspect it will have performed worse with increasing phylogenetic distance from the reference genomes. None of the species used for training were insectassociated, except for those generated by the authors for this study. As this feature was used to split the data it could impact the results. Some major results rely explicitly on having good gene annotations, like exon length, adding to these concerns. Looking manually at Table S1 at Ophiostoma, it does seem to be a general trend that the genomes annotated with Magnaporthe grisea have shorter exons than those annotated with H294. I also wonder if many of the trends evident in Figure 5 are also the result of these biases. Clades H1 and G each contain a species used in the training and have an increase in genes for example.

      We have applied 6 different reference training sets (instead of one) precisely to address the problem of increasing phylogenetic distance of annotated species. To further investigate the impact of chosen species for training, we plotted five gene features (number of genes, number of introns, intron length, exon length, fraction of genes with introns) as a function of   branch length distance from the species (or genus) used as a training set for annotation. We don’t see systematic biases across different training sets. However,  trends are very clear for clades annotated with fusarium. This set of species includes Hypocreales and Microascales, which is indeed unfortunate since Microascales is an IA group and at the same time the most distant from the fusarium genus in this set. To clarify if this trend is related to annotation bias or a biological trend, we compared gene annotations with those of Mycocosm, between Hypocreales Fusarium species, Hypocreales non-Fusarium species, and Microascales, and we observe exactly the same trends in all gene features. 

      Similarly, among species that were annotated with magnaporthe_grisea, Ophiostomatales (another IA group) are among the most distant from the training set species. Here, however, another order, Diaporthales, is similarly distant, yet the two orders display different feature ranges. In terms of exon length, top 2 species in this training set include Ophiostoma, and they reach similar exon length as the Ophiostoma species annotated using H294 as a training set. In summary, it is possible that the choice of annotation species has some effect on feature values; however, in this dataset, these biases are likely mitigated by biological differences among lifestyles and clades. 

      Unfortunately, the genomes available from NCBI will vary greatly in the quality of their repeat masking. While some will have been masked using custom libraries generated with software like Repeatmodeler, others will probably have been masked with public databases like repbase. As public databases are again biased towards certain species (Fusarium is well represented in repbase for example), this could have significant impacts on estimating repeat content. Additionally, even custom libraries can be problematic as some software (like RepeatModeler) will include multicopy host genes leading to bona fide genes being masked if proper filtering is not employed. A more consistent repeat masking pipeline would add to the robustness of the conclusions.

      We have searched for the same species in JGI Mycocosm and were able to retrieve 58 genome assemblies with matching species, with 19 of them belonging to the same strain as in our dataset. Overall we found no differences in genome assembly length. Interestingly, repeat content was slightly higher for NCBI genome assemblies compared to JGI Mycocosm assemblies, perhaps due to masking of host multicopy genes, as the reviewer mentioned. By comparing pathogenic and non-pathogenic species for the same 19 strains, we observe that JGI Mycocosm annotates fewer repeats in pathogenic species than Augustus annotations (but trends are similar when taking into account 58 matching species). Given a small number of samples, it is hard to draw any strong conclusions; however, the differences that we see are in favor of our general results showing no (or negative) correlation of repeat content with pathogenicity. 

      To a lesser degree, I wonder what impact the use of representative genomes for a species has on the analyses. Some species vary greatly in genome size, repeat content, and architecture among strains. I understand that it is difficult to address in this type of analysis, but it could be discussed.

      In our case the use of protein sequences could underestimate divergence between closely related strains from the same species. We also excluded strains of the same species to avoid overrepresentation of closely related strains with similar lifestyle traits. We agree that some changes in the genome architecture can occur very rapidly, even at the species level, though analyzing emergence of eg. pathogenicity at the population level would require a slightly different approach which accounts for population-level processes. 

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors report on the genomic correlates of the transition to the pathogenic lifestyle in Sordariomycetes. The pathogenic lifestyle was found to be better explained by the number of genes, and in particular effectors and tRNAs, but this was modulated by the type of interacting host (insect or not insect) and the ability to be vectored by insects.

      Strengths:

      The main strength of this study lies in the size of the dataset, and the potentially high number of lifestyle transitions in Sordariomycetes.

      Weaknesses:

      The main strength of the study is not the clarity of the conclusions.

      (1) This is due firstly to the presentation of the hypotheses. The introduction is poorly structured and contradictory in some places. It is also incomplete since, for example, fungusinsect associations are not mentioned in the introduction even though they are explicitly considered in the analyses.

      We thank the reviewer for pointing this out. We strived to address all comments and suggestions of the reviewer to clarify the message and remove the contradictions. We also added information about why we included insect-association trait in our analysis. 

      (2) The lack of clarity also stems from certain biases that are challenging to control in microbial comparative genomics. Indeed, defining lifestyles is complicated because many fungi exhibit different lifestyles throughout their life cycles (for instance, symbiotic phases interspersed with saprotrophic phases). In numerous fungi, the lifestyle referenced in the literature is merely the sampling substrate (such as wood or dung), which doesn't mean that this substrate is a crucial aspect of the life cycle. This issue is discussed by the authors, but they do not eliminate the underlying uncertainties.

      We agree with the reviewer that lack of certainty in the lifestyle or range of possible lifestyles of studied species is a weakness in this analysis. We are limited by the information available in the literature. We hope that our study will increase interest in collecting such data in the future.

      Reviewer #3 (Public review):

      Summary:

      This important study combines comparative genomics with other validation methods to identify the factors that mediate genome size evolution in Sordariomycetes fungi and their relationship with lifestyle. The study provides insights into genome architecture traits in this Ascomycete group, finding that, rather than transposons, the size of their genomes is often influenced by gene gain and loss. With an excellent dataset and robust statistical support, this work contributes valuable insights into genome size evolution in Sordariomycetes, a topic of interest to both the biological and bioinformatics communities.

      Strengths:

      This study is complete and well-structured.

      Bioinformatics analysis is always backed by good sampling and statistical methods. Also, the graphic part is intuitive and complementary to the text.

      Weaknesses:

      The work is great in general, I just had issues with the Figure 1B interpretation.

      I struggled a bit to find the correspondence between this sentence: "Most genomic features were correlated with genome size and with each other, with the strongest positive correlation observed between the size of the assembly excluding repeats and the number of genes (Figure 1B)." and the Figure 1B. Perhaps highlighting the key p values in the figure could help.

      We thank the reviewer for pointing out this sentence. Perhaps the misunderstanding comes from the fact that in this sentence one variable is missing. The correct version should be “Most genomic features were correlated with genome size and with each other, with the strongest positive correlation observed between the genome size, the genome size excluding repeats and the number of genes (Figure 1B)”. Also, the variable names now correspond better to those shown on the figure.

      Reviewer #1 (Recommendations for the authors):

      The authors have clearly done a lot of good work, and I think this study is worthwhile. I understand that my concerns about the underlying data could necessitate rerunning the entire analysis with better gene models, but there may be another option. JGI has a fairly standard pipeline for gene and repeat annotation. Their gene predictions are based on RNA data from the sequenced strain and should be quite good in general. One could either compare the annotations from this manuscript to those in mycocosm for genomes that are identical and see if there are systematic biases, or rerun some analyses on a subset of genomes from mycocosm. Indeed, it's possible that the large dataset used here compensates for the above concerns, but without some attempt to evaluate these issues, it's difficult to have confidence in the results.

      We very appreciate the positive reception of our manuscript. Following the reviewer’s comments we have investigated gene annotations in comparison with those of JGI Mycocosm, even though only 58 species were matching and only 19 of them were from the same strain. This dataset is not representative of the Sordariomycetes diversity (most species come from one clade), therefore will not reflect the results we obtained in this study. To note, the reason for not choosing JGI Mycocosm in the first place, was the poor representation of the insect-associated species, which we found key in this study. In general, we found that assembly lengths were nearly identical, number of genes was higher, and the repeat content was lower for the JGI Mycocosm dataset. When comparing different lifestyles (in particular pathogens vs. non-pathogens), we found the same differences for our and JGI Mycocosm annotations, with one exception being the repeat content. In the small subset (19 same-strain assemblies), our dataset showed the same level of repeats between the two lifestyles, whereas JGI Mycocosm showed lower repeat content for pathogens (but notably for all 58 species, the trend was same for our and JGI Mycocosm annotations). None of these observations are in conflict with our results where we find no or negative association of repeat content with pathogens. 

      The figures are very information-dense. While I accept that this is somewhat of a necessity for presenting this type of study, if the authors could summarize the important information in easier-to-interpret plots, that could help improve readability.

      We put a lot of effort into showing these complicated results in as approachable manner as possible. Given that other reviewers find them intuitive we decided to keep most of them as they are. To add more clarification, we added one supplementary figure showing distributions of genomic traits across lifestyles. Moreover, in Figure 5, a phylogenetic tree was added with position of selected clades, as well as a scatterplot showing distributions of mean values for genome size and number of genes for those clades. If the reviewer has any specific suggestions on what to improve and in which figure, we’re happy to consider it. 

      Reviewer #2 (Recommendations for the authors):

      I have no major comments on the analyses, which have already been extensively revised. My major criticism is the presentation of the background, which is very insufficient to understand the importance or relevance of the results presented fully.

      Lines are not numbered, unfortunately, which will not help the reading of my review.

      (1) The introduction could better present the background and hypotheses:

      (a) After reading the introduction, I still didn't have a clear understanding of the specific 'genome features' the study focuses on. The introduction fails to clearly outline the current knowledge about the genetic basis of the pathogenic lifestyle: What is known, what remains unknown, what constitutes a correlation, and what has been demonstrated? This lack of clarity makes reading difficult.

      We thank the reviewer for pointing this out. We have now included in the introduction a list of genomic traits we focus on. We also tried to be more precise about demonstrated pathogenic traits and other correlated traits in the introduction. 

      (b) Page 3. « Various features of the genome have been implicated in the evolution of the pathogenic lifestyle. » The cited studies did not genuinely link genome features to lifestyle, so the authors can't use « implicated in » - correlation does not imply causation.

      This sentence also somehow contradicts the one at the end of the paragraph: « we still have limited knowledge of which genomic features are specific to pathogenic lifestyle

      We thank the reviewer for this comment. We added a phrase “correlated with or implicated in” and changed the last sentence of the paragraph into “Yet we still have limited knowledge of how important and frequent different genomic processes are in the evolution of pathogenicity across phylogenetically distinct groups of fungi and whether we can use genomic signatures left by some of these processes as predictors of pathogenic state.”.

      (c) Page 3: « Fungal pathogen genomes, and in particular fungal plant pathogen genomes have been often linked to large sizes with expansions of TEs, and a unique presence of a compartmentalized genome with fast and slow evolving regions or chromosomes » Do the authors really need to say « often »? Do they really know how often?

      We removed “often”.

      (d) Such accessory genomic compartments were shown to facilitate the fast evolution of effectors (Dong, Raffaele, and Kamoun 2015) ». The cited paper doesn't « show » that genomic compartments facilitate the fast evolution of effectors. It's just an observation that there might be a correlation. It's an opinion piece, not a research manuscript.

      We changed the sentence to “Such accessory genomic compartments could facilitate the fast evolution of effectors”.

      (e) even though such architecture can facilitate pathogen evolution, it is currently recognized more as a side effect of a species evolutionary history rather than a pathogenicity related trait ». This sentence somehow contradicts the following one: « Such accessory genomic compartments were shown to facilitate the fast evolution of effectors".

      Here we wanted to point out that even though accessory genome compartments and TE expansions can facilitate pathogen evolution the origin of such architecture is not linked to pathogenicity. We reformulated the sentence to “Even though such architecture can facilitate pathogen evolution, it is currently recognized that its origin is more likely a side effect of a species evolutionary history rather than being caused by pathogenicity”.

      (f) As the number of genes is strongly correlated with fungal genome size (Stajich 2017), such expansions could be a major contributor to fungal genome size. » This sentence suggests that pathogens might have bigger genomes because they have more effectors. This is contradictory to the sentence right after « At the end of the spectrum are the endoparasites Microsporidia, which have among the smallest known fungal genomes ».

      The authors state that pathogens have bigger genomes and then they take an example of a pathogen that has a minimal genome. I know it's probably because they lost genes following the transition to endoparasitism and not related to their capacity to cause disease. I just want to point out that their writing could be more precise. I invite authors to think of young scholars who are new to the field of fungal evolutionary genomics.

      We thank the reviewer for prompting us to clarify the text. We rewrote this short extract as follows “Notably, not all pathogenic species experience genome or gene expansions, or show compartmentalized genome architecture. While gene family expansions are important for some pathogens, the contrary can be observed in others, such as Microsporidia. Due to transition to obligatory intracellular lifestyle these fungi show signatures of strong genome contractions and reduced gene repertoire (Katinka et al. 2001) without compromising their ability to induce disease in the host. This raises questions about universal genomic mechanisms of transition to pathogenic state.”

      (g) I find it strange that the authors do not cite - and do not present the major results of two other studies that use the same type of approach and ask the same type of question in Sordariomycetes, although not focusing on pathogenicity:

      Hensen et al.: https://pubmed.ncbi.nlm.nih.gov/37820761/

      Shen et al.: https://pubmed.ncbi.nlm.nih.gov/33148650/

      We thank the reviewer for pointing out this omission. We now added more information in the introduction to highlight the importance of the phylogenetic context in studying genome evolution as demonstrated by these studies. The following part was added to introduction:  “Other phylogenomic studies investigating a wide range of Ascomycete species, while not explicitly focusing on the neutral evolution hypothesis, have found strong phylogenetic signals in genome evolution, reflected in distinct genome characteristics (e.g., genome size, gene number, intron number, repeat content) across lineages or families (Shen et al. 2020; Hensen et al. 2023). Variation in genome size has been shown to correlate with the activity of the repeat-induced point mutation (RIP) mechanism (Hensen et al. 2023; Badet and Croll 2025), by which repeated DNA is targeted and mutated. RIP can potentially lead to a slower rate of emergence of new genes via duplication (Galagan et al. 2003), and hinder TE proliferation limiting genome size expansion (Badet and Croll 2025). Variation in genome dynamics across lineages has also been suggested to result from environmental context and lifestyle strategies (Shen et al. 2020), with Saccharomycotina yeast fungi showing reductive genome evolution and Pezizomycotina filamentous fungi exhibiting frequent gene family expansions. Given the strong impact of phylogenetic membership,  demographic history (Ne) and host-specific adaptations of pathogens on their genomes, we reasoned that further examination of genomic sequences in groups of species with various lifestyles can generate predictions regarding the architecture of pathogenic genomes.”

      (h) Genome defense mechanisms against repeated elements, such as RIP, are not mentioned while they could have a major impact on genome size (Hensen et al cited above; Badet and Croll https://www.biorxiv.org/content/10.1101/2025.01.10.632494v1.full).

      This citation is added in the text above.

      (i) Should the reader assume that the genome features to be examined are those mentioned in the first paragraph or those in the penultimate one?

      In the last paragraph of the introduction we included the complete list of investigated genomic traits.

      (j) The insect-associated lifestyle is mentioned only in the research questions on page 4, but not earlier in the introduction. Why should we care about insect-associated fungi?

      We apologize for this omission. We added a sentence explaining how neutral evolution hypotheses can explain patterns of genome evolution in endoparasites and species with specialized vectors (traits present in insect-associated species) and added a sentence in the last paragraph that this is the reason why we have selected this trait for analysis.  

      (2) Why use concatenation to infer phylogeny?

      (a) Kapli et al. https://pubmed.ncbi.nlm.nih.gov/32424311/ « Analyses of both simulated and empirical data suggest that full likelihood methods are superior to the approximate coalescent methods and to concatenation »

      (b) It also seems that a homogeneous model was used, and not a partitioned model, while the latter are more powerful. Why?

      We thank the reviewer for the comment. When we were reconstructing the phylogenetic tree  we were not aware of the publication and we followed common practices from literature for phylogenetic tree reconstruction even though currently they are not regarded as most optimal. In fact, in the first round of submission, we have included both concatenation as well as a multispecies coalescent method based on 1000 busco sequences and a concatenation method with different partitions for 250 busco sequences. All three methods produced similar topologies. Since the results were concordant, we chose to omit these analyses from the manuscript to streamline the presentation and focus on the most important results.

      (3) Other comments:

      Is there a table listing lifestyles?

      Yes, lifestyles (pathogenicity and insect-association) are listed in Supplementary Table S1. 

      (4) Summary:

      (a) seemingly similar pathogens »: meaning unclear; on what basis are they similar? why « seemingly »?

      We removed “seemingly” from the sentence.

      (b) Page 4: what's the difference between genome feature and genome trait?

      There is no difference. We apologize for the confusion. We changed “feature” to “trait” whenever it refers to the specific 13 genomic traits analyzed in this study.

      (c) Page 22: Braker, not Breaker

      corrected

      What do the authors mean when they write that genes were predicted with Augustus and Braker? Do they mean that the two sets of gene models were combined? Gene counts are based on Augustus (P24): why not Braker?

      We only meant here that gene annotation was performed using Braker pipeline, which uses a particular version of Augustus. We corrected the sentence.

      (d) Figure 2B and 2C:

      'Undetermined sign' or 'Positive/Negative' would be better than « YES » or it's just impossible to understand the figure without reading the legend.

      We changed “YES” to “UNDETERMINED SIGN” as suggested by the reviewer.

    1. eLife Assessment

      This valuable study uses a sophisticated array of techniques to investigate the mechanisms through which the chordotonal receptors in the locust ear (Müller's organ) sense auditory signals. Ultrastructural reconstruction of the sensory organ provides convincing evidence of the organization of the scolopidial structure that wraps the sensory neuron cilium. However, the recordings of sound-evoked motion and electrophysiological activity from the chordotonal sensory neurons provide incomplete evidence for the proposed axial stretch model of mechanotransduction.

    2. Reviewer #1 (Public review):

      Chaiyasitdhi et al. set out to investigate the detailed ultrastructure of the scolopidia in the locust Müller's organ, the geometry of the forces delivered to these scolopidia during natural stimulation, and the direction of forces that are most effective at eliciting transduction currents. To study the ultrastructure, they used the FIB-SEM technique, to study the geometry of natural stimulation, they used OCT vibrometry and high-speed light microscopy, and to study transduction currents, they used patch clamp physiology.

      Strengths:

      I believe that the ultrastructural description of the locust scolopidium is excellent and the first of its kind in any insect system. In particular, the finding of the bend in the dendritic cilium and the position of the ciliary dilation are interesting, and it would be interesting to see whether these are common features within the huge diversity of insect chordotonal organs.

      I believe the use of OCT to measure organ movements is a significant strength of this paper; however, using ex vivo preparations undermines any conclusions drawn about the system's in vivo mechanics.

      The choice of Group III scolopidia is also good. Research on the mechanics of locust tympana has shown that travelling waves are formed on the tympanum and waves of different frequencies show highest amplitudes at different positions on the tympanum, and therefore also on different groups of scolopidia within the Müller's organ (Windmill et al, 2005; 2008, and Malkin et al, 2013). The lowest frequency modal waves (F0) observed by Windmill et al 2008 were at about 4.4 kHz, which are slightly higher than the ~3 kHz frequencies studied in this paper but do show large deflections where these group III scolopidia attach at the styliform body (Windmill et al, 2005).

      This should be mentioned in the paper since the electrophysiology justification to use group III neurons is less convincing, given that Jacobs et al 1999 clearly point out that group III neurons are very variable and some of them are tuned much higher to 10 kHz, and others even higher to 20-30 kHz.

      Weaknesses:

      Specifically, it is understandable that the authors decided to use excised ears for the light microscopy, where Müller's organ would not be accessible in situ. However, it is very likely that excision will change the system's mechanics, especially since any tension or support to Müller's organ will be ablated. OCT enables in vivo measurements in fully undissected systems (Mhatre et al, Biorxiv, 2021) or in systems with minimal dissection where the mechanics have not been compromised (Vavakou et al, 2021). The choice to entirely dissect out the membrane is difficult to understand here.

      My main concern with this paper, however, is the use of light microscopy very close to the Nyquist limit to study scolopidial motion, and the fact that the OCT data contradict and do not match the light microscopy data.

      The light microscopy data is collected at ~8 kHz, and hence the Nyquist limit is ~4 kHz. It is possible to measure frequencies reliably this close to the limit, but the amplitude of motion is quite likely to be underestimated, given that the technique only provides 2 sample points per cycle at 4 kHz and approximately 2.66 sample points at 3 kHz. At that temporal resolution, the samples are much more likely to miss the peak of the wave than not, and therefore, amplitudes will be misestimated. A much more reasonable sample rate for amplitude estimation is generally about 10 samples per cycle. I do not believe the data from the microscopy is reliable for what the authors wish to use them for.

      Using the light microscopy data, the authors claim that the strains experienced by the group III scolopidia at 3 kHz are greater along the AP axis than the ML axis (Figure 4). However, this is contradicted by the OCT data, which show very low strain along the AP axis (black traces) at and around 3 kHz (Figure 3c and extended data Figure 2f) and show some movement along the ML axis (red traces, same figures). The phase at low amplitudes of motion cannot be considered very reliable either, and hence phase variations at these frequencies in the OCT cannot be considered reliable indicators of AP motion; hence, I'm unclear whether the vector difference in the OCT is a reliable indicator of movement.

      The OCT data are significantly more reliable as they are acquired at an appropriate sampling rate of 90 kHz. The authors do not mention what microphone they use to monitor or calibrate their sound field and phase measurements in OCT, but I presume this was done since it is the norm. Thus, the OCT data show that the movement within the Müller's organ is complex, probably traces an ellipse at some frequencies as observed in bushcrickets (Vavkou et al, 2021) and also thought to be the case in tree crickets based on the known attachment points of the TO (Mhatre et al, 2021). The OCT data shows relatively low AP motion at frequencies near 3 kHz, and higher ML motion, which contradicts the less reliable light microscopy data. Given that the locust membrane shows peaks in motion at ~4.5 kHz, ~11 kHz, and also at ~20 kHz (Windmill et al, 2008), I am surprised that the authors limited their OCT experiments and analyses to 5 kHz.

      In summary for this section, I am not convinced of the conclusion drawn by the authors that group III scolopidia receive significantly higher stimulation along the AP axis in their native configuration, if indeed they were studied in the appropriate force regime (altered due to excision).

      In the scolopidial patch clamp data, the authors study transduction currents in response to steady state stimulation along the AP axis and the ML axis. The responses to steady state and periodic forces may well be different, and the authors do not offer us a way to clearly relate the two and therefore, to interpret the data.

      In addition, both stimulation types, along the AP axis and the ML, elicit clear transduction responses. Stimulation along the AP axis might be slightly higher, but there is over 40% variation around the mean in one case (pull: 26.22 {plus minus} 10.99 pA) and close to 80% variation in the other (push: 10.96 {plus minus} 8.59 pA). These data are indeed from a very high displacement range (2000 nm), which is very high compared to the native displacement levels, which are in the 1-10 nm range.

      The factor change from sample to sample is not reported, and is small even overall. The statistical analyses of these data are not clearly reported, and I don't see the results of the overall ANOVA in the results section. I also find the dip in the reported transduction currents between 10 and 100 nm quite odd (Figure 5 j-m) and would like to know what the authors' interpretation of this behaviour is. It seems to me that those currents increase continuously linearly after ~50-100 nm and that the data below that range are in the noise. Thus, the transduction currents observed at the relevant displacement range (1-10 nm) may not actually be reliable. How were these small displacements achieved, and how closely were the actual levels monitored? Is it possible to reliably deliver 1-10 nm displacements using a micromanipulator?

      What is clear, despite the difficulty in interpreting this data, is that both AP and ML stimulation evoke transduction currents, and their relative differences are small. Additionally, in Müller's organ itself, in the excised organ, the scolopidia are stimulated along both axes. Thus, in my opinion, it is not possible to say that axial stretch along the cilium is 'the key mechanical input that activates mechano-electrical transduction'.

    3. Reviewer #2 (Public review):

      Summary of strengths and weaknesses:

      Using several techniques-FIB-SEM, OCT, high-speed light microscopy, and electrophysiology-Chaiyasitdhi et al. provide evidence that chordotonal receptors in the locust ear (Müller's organ) sense the stretch of the scolapale cell, primarily of its cilium. Careful measurements certainly show cell stretch, albeit with some inconsistencies regarding best frequencies and amplitudes. The weakest argument concerns the electrophysiological recordings, because the authors do not show directly that the stimulus stretches the cells. If this latter point can be clarified, then our confidence that ciliary stretch is the proximal stimulus for mechanotransduction will be increased. This conclusion will not come as a surprise for workers in the field, as the chordotonal organ is known as a stretch-receptor organ (e.g., Wikipedia). But it is a useful contribution to the field and allows the authors to suggest transduction mechanisms whereby ciliary stretch is transduced into channel opening.

    4. Reviewer #3 (Public review):

      Summary:

      The paper 'A stretching mechanism evokes mechano-electrical transduction in auditory chordotonal neurons' by Chaiyasitdhi et al. presents a study that aims to address the mechanical model for scolopidia in Schistocerca gregaria Müller's organ, the basic mechanosensory units in insect chordotonal organs. The authors combine high-resolution ultrastructural analysis (FIB-SEM), sound-evoked motion tracking (OCT and high-speed light microscopy), and electrophysiological recordings of transduction currents during direct mechanical stimulation of individual scolopidia. They conclude that axial stretching along the ciliary axis is an adequate mechanical stimulus for activating mechanotransduction channels.

      Strengths/Highlights:

      (1) The 3D FIB-SEM reconstruction provides high resolution of scolopidial architecture, including the newly described "scolopale lid" and the full extent of the cilium.

      (2) High-speed microscopy clearly demonstrates axial stretch as the dominant motion component in the auditory receptors, which confirms a long-standing question of what the actual motion of a stretch receptor is upon auditory stimulation.

      (3) Patch-clamp recordings directly link mechanical stretch to transduction currents, a major advance over previous indirect models.

      Weaknesses/Limitations:

      (1) The text is conceptually unclear or written in an unclear manner in some places, for example, when using the proposed model to explain the sensitivity of Nanchung-Inactive in the discussion.

      (2) The proposed mechanistic models (direct-stretch, stretch-compression, stretch-deformation, stretch-tilt) are compelling but remain speculative without direct molecular or biophysical validation. For example, examining whether the organ is pre-stretched and identifying the mechanical components of cells (tissues), such as the extracellular matrix and cytoskeleton, would help establish the mechanical model and strengthen the conclusion.

      (3) To some extent, the weaknesses of the paper are part of its strengths and vice versa. For example, the direct push/pull and up/down stimulations are a great experimental advance to approach an answer to the question of how the underlying cellular components are deformed and how the underlying ion channels are forced. However, as the authors clearly state, neither of their stimulations can limit all forces to only one direction, and both orthogonal forces evoke responses in the neurons. The question of which of the two orthogonal forces 'causes' the response cannot be answered with these experiments and has not been answered by this manuscript. But the study has brought the field a considerable step closer to answering the question. The answer, however, might be that both longitudinal ('stretch') and perpendicular ('compression') forces act together to open the ion channels and that both dendritic extension via stretch and bending can provide forces for ion channel gating. The current paper has identified major components (longitudinal stretch components) for the neurons they analysed, but these will surely have been chosen according to their accessibility, and as such, the variety of mechanical responses in Müller's organ might be greater. In light of these considerations, the authors might acknowledge such uncertainties more clearly in their paper. The paper is an impressive methodological progress and breakthrough, but it simply does not "demonstrate that axial stretch along the cilium is the adequate stimulus or the key mechanical input that activates mechano-electrical transduction" as the authors write at the start of their discussion. They do show that axial stretch dominates for the neurons they looked at, which is important information. The same applies to the end of the discussion: The authors write, "This relative motion within the organ then drives an axial stretch of the scolopidium, which in turn evokes the mechano-electrical transduction current." Reading the manuscript, the certainty and display of confidence are not substantiated by the data provided. But they are also not necessary. The study has paved the road to answer these questions. Instead, the authors are encouraged to make suggestions on how the remaining uncertainties could be removed (and what experiments or model might be used).

    5. Author response:

      Reviewer #1 (Public review):

      Chaiyasitdhi et al. set out to investigate the detailed ultrastructure of the scolopidia in the locust Müller's organ, the geometry of the forces delivered to these scolopidia during natural stimulation, and the direction of forces that are most effective at eliciting transduction currents. To study the ultrastructure, they used the FIB-SEM technique, to study the geometry of natural stimulation, they used OCT vibrometry and high-speed light microscopy, and to study transduction currents, they used patch clamp physiology.

      Strengths:

      I believe that the ultrastructural description of the locust scolopidium is excellent and the first of its kind in any insect system. In particular, the finding of the bend in the dendritic cilium and the position of the ciliary dilation are interesting, and it would be interesting to see whether these are common features within the huge diversity of insect chordotonal organs.

      Thank you very much for your comments. We indeed plan to extend and continue our approach to exploit and understand diverse chordotonal organs in insects and crustaceans.

      I believe the use of OCT to measure organ movements is a significant strength of this paper; however, using ex vivo preparations undermines any conclusions drawn about the system's in vivo mechanics.

      Having re-read the manuscript, we failed to explicitly describe our ex vivo preparation of Müller’s organ including key references that detail the largely retained physiological function of Müller’s organ. We have now revised this detail in the method section:

      “We used an excised locust ear preparation for all experiments, following a previously described dissection protocol [9]. In short, the tympanum, with Muller’s organ attached was left intact suspended between the cuticular rim. The cuticular rim of the tympanum was fixed into a hole in a preparation dish that allowed Muller’s organ to be submerged with extracellular saline, whilst the outside of the tympanum was dry and could be stimulated with airborne sound. This ex vivo preparation of Muller’s organ retained frequency tuning (Warren & Matheson, 2018), similar electrophysiological function as freshly dissected Muller’s organs (Hill, 1983a, 1983b; Michelsen, 1968: frequency discrimination in the locust ear by means of four groups of receptor cells), and amplitude coding (Warren & Matheson, 2018). Since Müller’s organ is backed by an air-filled trachea in vivo, the addition of saline solution in the ex vivo preparation decreased its displacements ~100 fold due to a dampening effect (Warren et al., 2020).”

      And in the last section of the introduction:

      “Here, we combined FIB-SEM to resolve the 3D ultrastructure of a scolopidium, OCT and high-speed microscopy to examine sound-evoked motion at both the organ and individual scolopidium levels, and direct mechanical stimulation of the scolopale cap, where the ciliary tip is anchored, whilst simultaneously recording transduction currents. Here, Muller’s organ and the tympanum was excised from the locust for physiological experiments. This ex vivo preparation of Muller’s organ retained frequency tuning, amplitude coding and electrophysiological function. This preparation also permitted the enzymatic isolation of individual scolopidia whilst recording transduction currents (Warren & Matheson, 2018).”  

      To further clarify physiological differences between the in vivo and ex vivo operation of the tympanum and Müller’s organ, we will perform an additional experiment for the revised manuscript by quantifying the changes in the sound-evoked tonotopic travelling wave of the tympanum using Laser Doppler Vibrometry (LDV). This result will be added to the Supplementary Text.

      The choice of Group III scolopidia is also good. Research on the mechanics of locust tympana has shown that travelling waves are formed on the tympanum and waves of different frequencies show highest amplitudes at different positions on the tympanum, and therefore also on different groups of scolopidia within the Müller's organ (Windmill et al, 2005; 2008, and Malkin et al, 2013). The lowest frequency modal waves (F0) observed by Windmill et al 2008 were at about 4.4 kHz, which are slightly higher than the ~3 kHz frequencies studied in this paper but do show large deflections where these group III scolopidia attach at the styliform body (Windmill et al, 2005).

      Thank you very much. We accept that the frequencies studied in this manuscript were lower than the lowest modal wave observed by Windmill et al., 2008. Other authors, according to Jacobs et al. 1999, found broad tuning form 3.4-3.74 kHz (Michelson et al., 1971) and 2-3.5 kHz (Halex et al., 1988). We settled on tuning previously measured for Group-III neurons in the same kind of preparation as in this manuscript, which was broadly around 3 kHz (Warren & Matheson, 2018).

      This should be mentioned in the paper since the electrophysiology justification to use group III neurons is less convincing, given that Jacobs et al 1999 clearly point out that group III neurons are very variable and some of them are tuned much higher to 10 kHz, and others even higher to 20-30 kHz.

      Looking at Fig. 7 from Jacobs et al., 1999, we indeed see that the four Group-III neurons recorded in this study are broadly tuned to 3-4 kHz. Often these tuning curves have threshold dips at higher frequencies at least 20 dB higher. We settled on the most sensitive frequency that we previously measured, and which also overlaps the most sensitive frequencies from several other studies.

      Weaknesses:

      Specifically, it is understandable that the authors decided to use excised ears for the light microscopy, where Müller's organ would not be accessible in situ. However, it is very likely that excision will change the system's mechanics, especially since any tension or support to Müller's organ will be ablated.

      We completely understand this criticism. We have now added descriptions in the methodology and introduction (as detailed previously). In short, the tympanum was left intact suspended on the cuticle. Müller’s organ retains all (measured) physiological properties: frequency tuning, amplitude coding and electrophysiological function. To further investigate whether this excised preparation is a representative of the in vivo conditions, we plan to measure tympanal mechanics, such as the travelling wave, as part of the revisions.

      OCT enables in vivo measurements in fully undissected systems (Mhatre et al, Biorxiv, 2021) or in systems with minimal dissection where the mechanics have not been compromised (Vavakou et al, 2021). The choice to entirely dissect out the membrane is difficult to understand here.

      The pioneering OCT works by Mhatre et al, Biorxiv, 2021 and Vavakou et al, 2021 set the new standard of in vivo measurements in the field. We also totally agree with Reviewer#1’s view that OCT is best performed on in vivo Müller’s organ and we tried OCT imaging of Müller’s organ for several months in vivo. Although the OCT penetrates the tympanum the OCT beam does not penetrate the tracheal air sac that surrounds Müller’s organ and therefore OCT cannot be used in vivo. Please also see previous comment with regards to the intact physiological operation of Muller’s organ in the ex vivo preparation.

      My main concern with this paper, however, is the use of light microscopy very close to the Nyquist limit to study scolopidial motion, and the fact that the OCT data contradict and do not match the light microscopy data. The light microscopy data is collected at ~8 kHz, and hence the Nyquist limit is ~4 kHz. It is possible to measure frequencies reliably this close to the limit, but the amplitude of motion is quite likely to be underestimated, given that the technique only provides 2 sample points per cycle at 4 kHz and approximately 2.66 sample points at 3 kHz. At that temporal resolution, the samples are much more likely to miss the peak of the wave than not, and therefore, amplitudes will be mis-estimated. A much more reasonable sample rate for amplitude estimation is generally about 10 samples per cycle. I do not believe the data from the microscopy is reliable for what the authors wish to use them for.

      We understand your concern that the study of sound-evoked motion of the scolopidium using light microscopy was done near the Nyquist limit (with our average sampling rate at 8.6 ± 0.3 kHz and the Nyquist limit at 4.3 kHz). We also agree with your comment that amplitude of the motion could be underestimated at frequencies closer to the limit. However, we find that this systematic error does not change the key observation from our direct light microscopy observation that axial stretch of the scolopidium occurs around 3 kHz.

      To address this concern, we plan to study the scolopidial motion within Group 1 auditory neurons, which are tuned to lower frequencies (0.5-1.5 kHz). This new set of data will allow us to obtain more data points per cycle (up to ~8.6 data points at 1 kHz). We will consider adding this result into the revised Fig. 4 or its extended data.

      Regarding increasing the sampling rate, we did try to achieve higher sampling rate (> 10 kHz), however, there is a technical limitation of our camera and a trade-off between other key parameters, such as the size of the region of interest (ROI) and magnification. To increase the sampling rate, we will have to reduce the magnification or the ROI and in turn lose the spatial resolution required for quantification of the scolopidial motion or the ROI does not cover the whole scolopidial motion. The sampling rate at 8.6 ± 0.3 kHz was the best we could achieve.

      Using the light microscopy data, the authors claim that the strains experienced by the group III scolopidia at 3 kHz are greater along the AP axis than the ML axis (Figure 4). However, this is contradicted by the OCT data, which show very low strain along the AP axis (black traces) at and around 3 kHz (Figure 3c and extended data Figure 2f) and show some movement along the ML axis (red traces, same figures). The phase at low amplitudes of motion cannot be considered very reliable either, and hence phase variations at these frequencies in the OCT cannot be considered reliable indicators of AP motion; hence, I'm unclear whether the vector difference in the OCT is a reliable indicator of movement.

      This is our fault for not clearly explaining the orientation of the light microscopy measurement, which then leads to the reviewer’s concern about contradiction between OCT and light microscopy. Our OCT measurements was done along the Antero-Posterior (AP) and Mesio-Lateral axes (ML), while the axial stretch of the scolopidium occurs along the Dorso-Ventral (DV) axis. We recognise that the anatomical references in this manuscript can be confusing, and we tried to show the orientation of the scolopidium relative to Müller’s organ in Fig. 3b. To further clarify the orientation of our observations, we will add anatomical references in Fig. 4a and Fig. 5a. in the revised manuscript.

      As stated in our result section (Line 165-167)

      “Notably, we could not resolve the Group-III scolopidia along the ventro-dorsal axis—which runs parallel to the dendrite—as the OCT beam was obstructed by either the cuticle or the elevated process”

      We did try to perform OCT measurement along the VD axis, but we could not resolve the scolopidial region along the scolopidial or ciliary axes because the OCT beam could not go through the thick cuticle at the edge of the tympanic membrane and the elevated process. For this reason, it is impossible for us to find an agreement or rule out any contradiction between the OCT and light microscopy since they are measuring motion along different axes. We plan to address this accessibility issue in a separate work using OCT measurements in combination with mirrors.

      The OCT data are significantly more reliable as they are acquired at an appropriate sampling rate of 90 kHz. The authors do not mention what microphone they use to monitor or calibrate their sound field and phase measurements in OCT, but I presume this was done since it is the norm.

      We use a condenser microphone (MK301, Microtech) and measuring amplifier (type 2610, Brüle & Kjær) for calibration. The calibration microphone was also calibrated beforehand using  a sound calibrator type 4231 from B&K.

      Thus, the OCT data show that the movement within the Müller's organ is complex, probably traces an ellipse at some frequencies as observed in bushcrickets (Vavkou et al, 2021) and also thought to be the case in tree crickets based on the known attachment points of the tympanal organ (Mhatre et al, 2021). The OCT data shows relatively low AP motion at frequencies near 3 kHz, and higher ML motion, which contradicts the less reliable light microscopy data. Given that the locust membrane shows peaks in motion at ~4.5 kHz, ~11 kHz, and also at ~20 kHz (Windmill et al, 2008), I am surprised that the authors limited their OCT experiments and analyses to 5 kHz.

      We found that immediately above 5 kHz the displacements reduced to undetectable magnitudes. We accept that there may be other modes of vibration at higher frequencies >10 kHz (based on Jacobs et al., 1999) that we could have detected with OCT. However, we focused our analysis on Group-III neurons at the best frequency and frequencies that we could cross-compere between our high-speed imaging system and OCT.

      In summary for this section, I am not convinced of the conclusion drawn by the authors that group III scolopidia receive significantly higher stimulation along the AP axis in their native configuration, if indeed they were studied in the appropriate force regime (altered due to excision).

      Again, we accept our faults for not clearly displaying the anatomical references of the scolopidial and ciliary axes in Fig. 4 and Fig. 5. We also did not clearly describe in detail that our ex vivo preparation largely retains its physiological properties. We will address the errors of our measurement near Nyquist and provide additional information from Group 1 scolopidia where we could achieve higher data points per cycle.

      In the scolopidial patch clamp data, the authors study transduction currents in response to steady state stimulation along the AP axis and the ML axis. The responses to steady state and periodic forces may well be different, and the authors do not offer us a way to clearly relate the two and therefore, to interpret the data.

      We will revise the Fig. 5a to clarify that the push-pull were done along the Dorso-Ventral (DV) axis and the push-pull were done along the Antero-Posterior (AP) axis. We do agree that steady-state and periodic forces may well be very different. However, valuable insight can be gained from mechanical systems when displaced outside of their normal physiological frequency (e.g. the transformative work on vertebrate hair bundle mechanics, Howard & Hudspeth, 1988). For the same reason, we believe artificial stimulation of the scolopidium gives us new and crucial information to understand scolopidial mechanics. Our main finding that stretch is the dominant stimulus should still, or at least provide strong support, that stretch is the dominant stimulus in periodical motion.

      In addition, both stimulation types, along the AP axis and the ML, elicit clear transduction responses. Stimulation along the AP axis might be slightly higher, but there is over 40% variation around the mean in one case (pull: 26.22 {plus minus} 10.99 pA) and close to 80% variation in the other (push: 10.96 {plus minus} 8.59 pA). These data are indeed from a very high displacement range (2000 nm), which is very high compared to the native displacement levels, which are in the 1-10 nm range.

      In this experiment, we wished to establish the upper limits (and plateau region) of displacement-transduction current response. However, even at 2000 nm we still did not see a plateau. Therefore, we believe that the strain on the scolopidium is still in the operating range even though our displacement is not. This discrepancy can be explained because the base of the scolopidium is not fixed. Therefore, the displacement imposed in our experiment is not equivalent to the strain on the cilium but a combination of pulling and stretching along the length of the dendrite. The force, however, remains along that particular axis, supporting our main finding.

      Another important consideration is that the cilium is surrounded by the scolopale wall. It is assumed that the scolopale wall is far stiffer than the ciliary and will therefore limit the amount of ciliary strain.

      The factor change from sample to sample is not reported and is small even overall. The statistical analyses of these data are not clearly reported, and I don't see the results of the overall ANOVA in the results section.

      We reported the statistical analyses in the Fig. 5 Source Data. We will now add tables displaying these statistics in the supplementary text of the revised manuscript.

      I also find the dip in the reported transduction currents between 10 and 100 nm quite odd (Figure 5 j-m) and would like to know what the authors' interpretation of this behaviour is. It seems to me that those currents increase continuously linearly after ~50-100 nm and that the data below that range are in the noise. Thus, the transduction currents observed at the relevant displacement range (1-10 nm) may not actually be reliable. How were these small displacements achieved, and how closely were the actual levels monitored? Is it possible to reliably deliver 1-10 nm displacements using a micromanipulator?

      One interpretation is that the cilium has both sensitive and insensitive mechanically gated ion channels. A finding that is also supported by Effertz et al., 2012. We will add a sentence in the discussion highlighting this interpretation. We will also provide our calibration of displacement vs voltage delivered to the piezo in the Supplementary Text.

      What is clear, despite the difficulty in interpreting this data, is that both AP and ML stimulation evoke transduction currents, and their relative differences are small. Additionally, in Müller's organ itself, in the excised organ, the scolopidia are stimulated along both axes. Thus, in my opinion, it is not possible to say that axial stretch along the cilium is 'the key mechanical input that activates mechano-electrical transduction'.

      We confirm that the scolopidia are displaced along both. We also note that displacements of the scolopidium limited to the up-down axis will also produce a strain on the scolopidium along the push-pull axis. However, we tried to disentangle this complex motion by limiting the displacements to one axis during recordings of the transduction current. We found that displacement along the scolopidial axis generated the largest transduction currents. Even though there is large variation our statistical analysis confirmed a significant difference as stated in the result section (Line 283 – 286)

      “Additionally, the transduction current evoked by pull from the resting position was larger than displacement upward, 12.17 ± 5.37 pA (N = 11, n = 11) (Tukey's procedure, p = 1.75e-03, t = -3.83) or downward 7.28 ± 9.76 pA (N = 11, n = 11) (Tukey's procedure, p = 5.10e-06, t = -4.53).”

      The reason for large variation is that the discrete depolarisations (random depolarisations of unknown function and a common feature of chordotonal neurons so far recorded) have a similar magnitude to the transduction current produced by the step displacements. We will highlight these discrete depolarisations in Figure 4d and mention them in the results.

      Reviewer #2 (Public review):

      Summary of strengths and weaknesses:

      Using several techniques-FIB-SEM, OCT, high-speed light microscopy, and electrophysiology-Chaiyasitdhi et al. provide evidence that chordotonal receptors in the locust ear (Müller's organ) sense the stretch of the scolapale cell, primarily of its cilium. Careful measurements certainly show cell stretch, albeit with some inconsistencies regarding best frequencies and amplitudes.

      Thank you very much for acknowledging the strength of our study. Regarding the inconsistencies between best frequencies and amplitude, we believe that this concern largely arises from our faults for not clearly displaying the anatomical references of the scolopidial and ciliary axes in Fig. 4 and Fig. 5. As previously addressed in our response to Reviewer#1, we will add the anatomical references and revised the text to clarify the orientation of our measurements.

      The weakest argument concerns the electrophysiological recordings, because the authors do not show directly that the stimulus stretches the cells. If this latter point can be clarified, then our confidence that ciliary stretch is the proximal stimulus for mechanotransduction will be increased.

      We agree that the displacement is not solely stretching the scolopidium. However, the force is still constrained and acting along the push-pull axis. Due to this reason, we overestimate the displacement required to open the MET channels but stand by our conclusion that stretch is the dominant stimulus. For future work, we wish to devise a technique to mechanically clamp the base of the scolopidium and measure the more physiological relevant current-strain relationship.

      This conclusion will not come as a surprise for workers in the field, as the chordotonal organ is known as a stretch-receptor organ (e.g., Wikipedia). But it is a useful contribution to the field and allows the authors to suggest transduction mechanisms whereby ciliary stretch is transduced into channel opening.

      One of the goals of this manuscript is to highlight the lack of direct evidence for stretch-sensitivity of chordotonal organs, as this is assumed from their structure. More importantly the acceptance of chordotonal organs, as being stretch sensitive does not address the mechanism of how organs work. For instance, one candidate for the MET channel, NompC, is shown to be sensitive to compression (Wang et al., 2021). We find that a preconceived concept of “stretch-sensitive” mechanism, without an appreciation of scolopidium mechanics, cannot explain how NompC can be opened in chordotonal organs.

      P. .E. Howse wrote in his work on ‘The Fine Structure and Functional Organisation of Chordotonal Organs’ in 1968 (Symp. Zool. Soc. Lon.) No. 23

      “There is, however, a common tendency to refer to chordotonal organs in which scolopidia are contained in a connective tissue strand as “stretch receptor”. This is unfortunate in two senses, for firstly the implied function may not have been proved and secondly even if the organ responds to stretch the scolopidia may not.” then he proceeded to cite a pioneering work in the chordotonal organs of the hermit crab by R.C. Taylor (Comp. Biochem. Physiol. 1966) showing that the scolopidia may experience flexing when the connective strand are stretched.

      This work represents the first efforts to investigate the problematic assumption of stretch-sensitivity of scolopidia since it was first highlighted 57 years ago.

      Reviewer #3 (Public review):

      Summary:

      The paper 'A stretching mechanism evokes mechano-electrical transduction in auditory chordotonal neurons' by Chaiyasitdhi et al. presents a study that aims to address the mechanical model for scolopidia in Schistocerca gregaria Müller's organ, the basic mechanosensory units in insect chordotonal organs. The authors combine high-resolution ultrastructural analysis (FIB-SEM), sound-evoked motion tracking (OCT and high-speed light microscopy), and electrophysiological recordings of transduction currents during direct mechanical stimulation of individual scolopidia. They conclude that axial stretching along the ciliary axis is an adequate mechanical stimulus for activating mechanotransduction channels.

      Strengths/Highlights:

      (1) The 3D FIB-SEM reconstruction provides high resolution of scolopidial architecture, including the newly described "scolopale lid" and the full extent of the cilium.

      (2) High-speed microscopy clearly demonstrates axial stretch as the dominant motion component in the auditory receptors, which confirms a long-standing question of what the actual motion of a stretch receptor is upon auditory stimulation.

      (3) Patch-clamp recordings directly link mechanical stretch to transduction currents, a major advance over previous indirect models.

      Weaknesses/Limitations:

      (1) The text is conceptually unclear or written in an unclear manner in some places, for example, when using the proposed model to explain the sensitivity of Nanchung-Inactive in the discussion.

      We will rephrase and make clearer the context of our findings for Nanchung-Inactive mechanism of MET in the introduction and the discussion. We will also refine and simplify unclear text overall.

      (2) The proposed mechanistic models (direct-stretch, stretch-compression, stretch-deformation, stretch-tilt) are compelling but remain speculative without direct molecular or biophysical validation. For example, examining whether the organ is pre-stretched and identifying the mechanical components of cells (tissues), such as the extracellular matrix and cytoskeleton, would help establish the mechanical model and strengthen the conclusion.

      We agree with the speculative nature of our four proposed hypotheses. We have, however, narrowed down from at least ten previous hypotheses (Field and Matheson, 1998). These hypotheses will enable us, and hopefully the field, to test them and more rapidly advance our understanding of how scolopidia work. We will add a section in the discussion as to the best way to experimentally test these four hypotheses (e.g pushing directly onto the cap should elicit sensitive responses for the cap-compression hypothesis).

      (3) To some extent, the weaknesses of the paper are part of its strengths and vice versa. For example, the direct push/pull and up/down stimulations are a great experimental advance to approach an answer to the question of how the underlying cellular components are deformed and how the underlying ion channels are forced. However, as the authors clearly state, neither of their stimulations can limit all forces to only one direction, and both orthogonal forces evoke responses in the neurons. The question of which of the two orthogonal forces 'causes' the response cannot be answered with these experiments and has not been answered by this manuscript. But the study has brought the field a considerable step closer to answering the question. The answer, however, might be that both longitudinal ('stretch') and perpendicular ('compression') forces act together to open the ion channels and that both dendritic extension via stretch and bending can provide forces for ion channel gating.

      Thank you very much for your acknowledgement of our experimental advances. We agree that this study cannot identify and localise the forces on the cilium as it is enclosed in the scolopidial unit. As previously explained, we plan to address this question in our next work by improving and expanding our experimental techniques, including modelling, to study the scolopidial mechanics based on our experiments using patch-clamp recording in combination with individual and direct manipulation the scolopidium.

      The current paper has identified major components (longitudinal stretch components) for the neurons they analysed, but these will surely have been chosen according to their accessibility, and as such, the variety of mechanical responses in Müller's organ might be greater. In light of these considerations, the authors might acknowledge such uncertainties more clearly in their paper.

      Our high-speed and OCT imaging confirms complex multi-dimensional displacements (and presumably forces) acting on the scolopidium. We agree that our mechanical stimulation cannot recapitulate such complex motions. But for future work we wish to extend our mechanical stimulation to three axis and also to pivot on the axis of the scolopidial cap.

      The paper is an impressive methodological progress and breakthrough, but it simply does not "demonstrate that axial stretch along the cilium is the adequate stimulus or the key mechanical input that activates mechano-electrical transduction" as the authors write at the start of their discussion.

      We rephrase to clarity that stretching along the “scolopidial axis”, not “along the ciliary axis” is the adequate stimulus. We cannot yet verify how this translates to forces acting on the cilium, hence the four speculative hypotheses. We will re-write the discussion to make clear that we are only interpretating the forces and displacements at the level of the cilium.

      They do show that axial stretch dominates for the neurons they looked at, which is important information. The same applies to the end of the discussion: The authors write, "This relative motion within the organ then drives an axial stretch of the scolopidium, which in turn evokes the mechano-electrical transduction current." Reading the manuscript, the certainty and display of confidence are not substantiated by the data provided. But they are also not necessary. The study has paved the road to answer these questions. Instead, the authors are encouraged to make suggestions on how the remaining uncertainties could be removed (and what experiments or model might be used).

      We will moderate our conclusion in the discussion, but we are confident that we have experimental repeats, and the statistical test, to support our conclusion that stretching of the scolopidium provides that largest transduction current responses (although not at the level of the cilium). As mentioned previously, we will include a section in the discussion for the best way to test the hypotheses arising from this work.

    1. eLife Assessment

      This study provides new and interesting findings that SCoR2 acts as a denitrosylase to control cardioprotective metabolic reprogramming and prevent injury following ischemia/reperfusion. The compelling evidence is supported by a novel multi-omics approach, but questions remain regarding the stability and human relevance of BDH1 as well as the sufficiency of SCoR2. Overall, the work will be of interest to cardiovascular researchers and provides valuable information to the field, though some mechanistic aspects require further clarification.

    2. Reviewer #1 (Public review):

      Summary:

      This study shows a novel role for SCoR2 in regulating metabolic pathways in the heart to prevent injury following ischemia/reperfusion. It combines a new multi-omics method to determine SCoR2 mediated metabolic pathways in the heart. This paper would be of interest to cardiovascular researchers working on cardioprotective strategies following ischemic injury in the heart.

      Strengths:

      (1) Use of SCoR2KO mice subjected to I/R injury.

      (2) Identification of multiple metabolic pathways in the heart by a novel multi-omics approach.

      Comments on revisions:

      Authors have addressed all concerns raised in the previous round of review. Substantial modifications have been made in response to those concerns. There are no further comments.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript addresses the gap in knowledge related to the cardiac function of the S-denitrosylase SNO-CoA Reductase 2 (SCoR2; product of the Akr1a1 gene). Genetic variants in SCoR2 have been linked to cardiovascular disease, yet its exact role in heart remains unclear. This paper demonstrates that mice deficient in SCoR2 show significant protection in a myocardial infarction (MI) model. SCoR2 influenced ketolytic energy production, antioxidant levels, and polyol balance through the S-nitrosylation of crucial metabolic regulators.

      Strengths:

      Addresses a well-defined gap in knowledge related to the cardiac function of SNO-CoA Reductase 2. Besides the in-depth case for this specific player, the manuscripts sheds more light on the links between S-nytrosylation and metabolic reprogramming in heart.

      Rigorous proof of requirement through the combination of gene knockout and in vivo myocardial ischemia/reperfusion

      Identification of precise Cys residue for SNO-modification of BDH1 as SCoR2 target in cardiac ketolysis

      Weaknesses:

      The experiments with BDH1 stability were performed in mutant 293 cells. Was there a difference in BDH1 stability in myocardial tissue or primary cardiomyocytes from SCoR2-null vs -WT mice? Same question extends to PKM2.

      In the absence of tracing experiments, the cross-sectional changes in ketolysis, glycolysis or polyol intermediates presented in Figures 4 and 5 are suggestive at best. This needs to be stressed while describing and interpreting these results.

      The findings from human samples with ischemic and non-ischemic cardiomyopathy do not seem immediately or linearly in line with each other and with the model proposed from the KO mice. While the correlation holds up in the non-ischemic cardiomyopathy (increased SNO-BDH1, SNO-PKM2 with decreased SCoR2 expression), how do the Authors explain the decreased SNO-BDH1 with preserved SCoR2 expression in ischemic cardiomyopathy? This seems counterintuitive as activation of ketolysis is a quite established myocardial response to the ischemic stress. It may help the overall message clarity to focus the human data part on only NICM patients.

      (partially linked to the point above) an important proof that is lacking at present is the proof of sufficiency for SCoR2 in S-Nytrosylation of targets and cardiac remodeling. Does SCoR2 overexpression in heart or isolated cardiomyocytes reduce S-nitrosylation of BDH1 and other targets, undermining heart function at baseline or under stress?

      Comments on revisions:

      Some of my points have been addressed. However, the points related to 1) BDH1 stability effect in cardiomyocytes; 2) human relevance of SNO-BDH1; 3) SCoR2 sufficiency remain unclear. That said, this manuscript will provide useful information to the field as such.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript demonstrates that mice lacking the denitrosylase enzyme SCoR2/AKR1A1 demonstrate a robust cardioprotection resulting from reprogramming of multiple metabolic pathways, revealing<br /> widespread, coordinated metabolic regulation by SCoR2.

      Strengths:

      The extensive experimental evidence provided the use of the knockout model

      Weaknesses:

      No direct evidence for the underlying mechanism.

      The mouse model used is not a tissue-specific knock-out.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This study shows a novel role for SCoR2 in regulating metabolic pathways in the heart to prevent injury following ischemia/reperfusion. It combines a new multi-omics method to determine SCoR2 mediated metabolic pathways in the heart. This paper would be of interest to cardiovascular researchers working on cardioprotective strategies following ischemic injury in the heart. 

      Strengths:

      (1) Use of SCoR2KO mice subjected to I/R injury. 

      (2) Identification of multiple metabolic pathways in the heart by a novel multi-omics approach.

      We thank the Reviewer for the positive review of our manuscript.

      Weaknesses:

      (1) Use of a global SCoR2KO mice is a limitation since the effects in the heart can be a combination of global loss of SCoR2. 

      (2) Lack of a cell type specific effect. 

      We agree that global KOs limit the cell type-specific mechanistic conclusions that can be drawn. Global knockouts are nonetheless informative in their own right and serve to identify phenotypes worthy of further study.

      Reviewer #2 (Public review):

      Summary: 

      This manuscript addresses the gap in knowledge related to the cardiac function of the S-denitrosylase SNOCoA Reductase 2 (SCoR2; product of the Akr1a1 gene). Genetic variants in SCoR2 have been linked to cardiovascular disease, yet their exact role in the heart remains unclear. This paper demonstrates that mice deficient in SCoR2 show significant protection in a myocardial infarction (MI) model. SCoR2 influenced ketolytic energy production, antioxidant levels, and polyol balance through the S-nitrosylation of crucial metabolic regulators. 

      Strengths: 

      (1) Addresses a well-defined gap in knowledge related to the cardiac function of SNO-CoA Reductase 2. Besides the in-depth case for this specific player, the manuscript sheds more light on the links between Snitrosylation and metabolic reprogramming in the heart.

      (2) Rigorous proof of requirement through the combination of gene knockout and in vivo myocardial ischemia/reperfusion. 

      (3) Identification of precise Cys residue for SNO-modification of BDH1 as SCoR2 target in cardiac ketolysis 

      We thank the Reviewer for their kind words.

      Weaknesses: 

      (1) The experiments with BDH1 stability were performed in mutant 293 cells. Was there a difference in BDH1 stability in myocardial tissue or primary cardiomyocytes from SCoR2-null vs -WT mice? The same question extends to PKM2. 

      We have not assessed BDH1 stability directly in cardiomyocytes. However, S-nitrosylation increased BDH1 stability in HEK293 cells, and BDH1 expression was increased in (injured) hearts of SCoR2KO mice, together with increased SNO-BDH1. 

      For PKM2, there is a wealth of published evidence from us and others that S-nitrosylation does not regulate protein stability but rather inhibits tetramerization required for full activity.  

      (2) In the absence of tracing experiments, the cross-sectional changes in ketolysis, glycolysis, or polyol intermediates presented in Figures 4 and 5 are suggestive at best. This needs to be stressed while describing and interpreting these results. 

      We now acknowledge this limitation in the ‘Limitations’ section of the manuscript and in edits made to the text. 

      (3) The findings from human samples with ischemic and non-ischemic cardiomyopathy do not seem immediately or linearly in line with each other and with the model proposed from the KO mice. While the correlation holds up in the non-ischemic cardiomyopathy (increased SNO-BDH1, SNO-PKM2 with decreased SCoR2 expression), how do the authors explain the decreased SNO-BDH1 with preserved SCoR2 expression in ischemic cardiomyopathy? This seems counterintuitive as activation of ketolysis is a quite established myocardial response to ischemic stress. It may help the overall message clarity to focus the human data part on only NICM patients. 

      We find it interesting and important that SNO-BDH1 is readily detected in human heart tissue and its level is correlated to disease state. Our findings suggest conservation of this mechanism in human heart failure. However, we caution against drawing further conclusions related to NICM or ICM. Our animal model (based on a single time point) cannot faithfully recapitulate patients with chronic heart disease or differences between NICM and ICM. 

      (4) This is partially linked to the point above. An important proof that is lacking at present is the proof of sufficiency for SCoR2 in S-nitrosylation of targets and cardiac remodeling. Does SCoR2 overexpression in the heart or isolated cardiomyocytes reduce S-nitrosylation of BDH1 and other targets, undermining heart function at baseline or under stress? 

      The Reviewer proposes to test the effect of SCoR2 overexpression on cardioprotection. This is an interesting experiment for future study with the following caveats. First, it presupposes that native expression of SCoR2 is insufficient to control basal steady state S-nitrosylation of SNO-BDH1 and SNO-PKM2 (this does not seem to be the case). Second, overexpressed SCoR2 may be mislocalized within cells or associated with unnatural targets. Thank you.

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript demonstrates that mice lacking the denitrosylase enzyme SCoR2/AKR1A1 demonstrate a robust cardioprotection resulting from reprogramming of multiple metabolic pathways, revealing widespread, coordinated metabolic regulation by SCoR2. 

      Strengths: 

      (1) The extensive experimental evidence. 

      (2) The use of the knockout model. 

      We thank the Reviewer for identifying strengths in our work.

      Weaknesses: 

      (1) The connection of direct evidence for the mechanism. 

      We believe we have identified a novel mechanism for cardioprotection entailing coordinate reprogramming of multiple metabolic pathways and suggesting a widescale role for SCoR2 in metabolic regulation. This is the key message we convey. While genetic dissection of individual pathways may be worthwhile, these investigations will have their own limitations. 

      (2) The mouse model used is not tissue-specific. 

      Please see our response to Reviewer 1, above. 

      Reviewer #1 (Recommendations for the authors):

      In the study, titled "The denitrosylase SCoR2 controls cardioprotective metabolic reprogramming", Grimmett ZW et al., describe a role for SNO-CoA Reductase 2 (SCoR2) in promoting cardioprotection via metabolic reprogramming in the heart after I/R injury. Authors show that loss SCoR2 coordinates multiple metabolic pathways to limit infarct size. Overall, the hypothesis is interesting, however there are some limitations as described below: 

      (1) It is unclear whether SCoR2 mice are global or cardiomyocyte specific. 

      We apologize for any confusion. These are global SCoR2<sup>-/-</sup> mice. This is now stated in the Results when first identifying the strain, as well as in the Methods.  

      (2) Can the authors clarify how divergent metabolic pathways such as Ketone oxidation, glycolysis, PPP and polyol metabolism work downstream of SCoR2 to impact cardioprotection in mice with I/R. 

      The metabolic pathways of ketone oxidation, glycolysis, PPP and polyols appear to converge to support ischemic cardioprotection in SCoR2<sup>-/-</sup> mice, as depicted in the model shown in Fig. 5L. Subsequent to SNO-PKM2 blockade of flux through glycolysis (detailed in this manuscript and in Zhou et al, 2019, PMID: 30487609, as well as by others), substrates of ketolysis and glycolysis are funneled into the PPP, producing the antioxidant NADPH and energy precursor phosphocreatine, which are well-known to be cardioprotective. This occurs more readily in SCoR2<sup>-/-</sup> mice due to elevated SNO-BDH1 (detailed in this manuscript). 

      Polyols, thought to be products of the PPP carbohydrate intermediates arabinose, ribulose, xylulose (among others), have recently been shown to be harmful to cardiovascular health in humans. These polyols are uniformly downregulated in SCoR2<sup>-/-</sup> mice. We suggest this is likely the result of S-nitrosylation of SCoR2-substrate enzymes that form polyols (SCoR2/Akr1a1 is unable to directly reduce carbohydrates to their corresponding polyols). Regulation of endogenous polyol production in humans is a new concept and the mechanisms whereby these compounds increase risk of cardiac events are a subject of active investigation. This is detailed in the final paragraph of both the Results and Discussion sections, and in Fig. 5L. 

      (3) The only functional outcome of SCoR2 loss in echocardiography and measurements for apoptosis. However, it would be important to determine whether the cardioprotective effect persists. It seems cardiac function was recorded 24hours post injury and whether the benefit remains till later time point such as 2 or 4 weeks is not shown. Without this time point, loss of SCoR2 only leads to an acute increment in function. 

      Loss of SCoR2 reduced post-MI mortality at 4 hr; cardiac functional changes (plus troponin, LDH, and apoptosis) were studied in surviving animals at 24 hr post-MI. Cardiac response to acute injury and to chronic injury (weeks post-MI) are not the same metabolically. This is well elucidated in the literature and exemplified by the role of PKM2, which is protective in the chronic response to MI (28 days post-MI; PMID: 32078387), but implicated in injury at shorter timepoints post-MI (PMID: 33288902, 28964797). All that said, functional changes at 2-4 weeks will be important to determine in the future, as the Reviewer indicates. 

      Reviewer #2 (Recommendations for the authors): 

      (1) The last paragraph of the Results section should be divided into the statement related to Table S2 in the Results section, and the rest of the paragraph should be put somewhere in the Discussion. 

      Thank you for this suggestion, which we have taken. 

      (2) The number of mice alive/dead should be reported in the histogram in Figure 1G. 

      Done.

      (3) A concise Graphical Abstract will be useful to grasp the overall logic and message of the manuscript from the beginning. 

      We thank you for this suggestion and have added a graphical abstract to the manuscript.

      Reviewer #3 (Recommendations for the authors): 

      I would suggest having more evidence on the effect of metabolic reprogramming on which cell type. The use of a global knockout is a major limitation, and probably some in vitro experiments with shRNA knockdown in endothelial cells and fibroblasts would provide more insights. 

      The reviewer suggests one direction for future study. We identify a novel mechanism for cardioprotection entailing coordinate reprogramming of multiple metabolic pathways and suggesting a widescale role for SCoR2 in metabolic regulation. This is the message we wish to convey. The role of cardiomyocytes vs contributing cell types is a thoughtful direction for future study. Thank you. 

      Editor's additional comment:

      The editors wish to highlight a critical issue concerning the characterization of the SCoR2−/− mice employed in this study. 

      In the Methods section (page 20), the manuscript states that "SCoR2+/− mice were made by Deltagen, Inc. as described previously (33)." However, reference 33 does not describe SCoR2−/− mice; instead, it refers to other genetically modified strains, including Akr1a1+/−, eNOS−/−, and PKM2−/− mice, with no mention of a SCoR2-targeted model. 

      The editors fully acknowledge that the authors may be using the term "SCoR2" as a functional synonym for Akr1a1, based on its described role as a mammalian homologue of yeast SCoR. If this is the case, such equivalence should be explicitly stated in the manuscript to prevent potential confusion. Moreover, considering that the genetic deletion of Akr1a1 (i.e., SCoR2) underlies the key mechanistic findings presented, it is essential that the manuscript include a clear and comprehensive description of the generation and validation of the mouse model used. 

      We therefore ask the authors to (1) clarify the nomenclature and relationship between "SCoR2" and Akr1a1, and (2) provide full details on the generation of the knockout mice, including the targeting strategy and the genotyping procedures. This information is necessary not only to ensure transparency and reproducibility but also to allow readers to fully appreciate the biological relevance of the findings.

      Thank you for identifying this inconsistency. We have adjusted the manuscript text accordingly to clearly state that SCoR2 is a functional name for the product of the Akr1a1 gene and that these SCoR2<sup>-/-</sup> mice are the same as Akr1a1<sup>-/-</sup> mice described in Ref 33. We have augmented the Methods text to describe the generation and genotyping of these SCoR2/Akr1a1 knockout mice.

    1. eLife Assessment

      Using high-throughput small-molecule screening, this study discloses novel modulators of the mitochondrial transcription factor A (TFAM), a key regulator of mitochondrial function. Reviewers viewed the targeting of TFAM as innovative and the study's conclusions as potentially important (especially the effects on inflammation). However, the lack of evidence for a direct effect of the compounds on TFAM activity weakens the paper's key conclusion and renders the study incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The authors identify small-molecule compounds modulating the stability of the mitochondrial transcription factor A (TFAM) using a high-throughput CETSA screen and subsequent secondary assays. The identified compounds increased the protein levels of TFAM without affecting its RNA levels and led to an increase in mtDNA levels. As a read-out for dose-dependent action of the identified compounds, the authors investigated cGAS-STING and ISG activation in cellular inflammation models in the presence or absence of their compounds. The addition of TFAM modulators led to a decrease in cGAS-STING/ISG activation and decreased mtDNA release. Furthermore, beneficial effects could be determined in models of mtDNA disease (rescue of ATP rates), sclerotic fibroblasts (decreased fibrosis), and regulatory T cells (decreased activation of effector T cells). The study thus proposes novel first-in-class regulators of TFAM as a therapeutic option in conditions of mitochondrial dysfunction.

      Strengths:

      The authors identified TFAM as a promising target in conditions of mitochondrial dysfunction, as it is a key regulator of mitochondrial function, serving both as a transcription and packaging factor of mtDNA. Importantly, TFAM is a key regulator of mtDNA copy number, and a moderate increase in TFAM/mtDNA levels has been shown to be beneficial in a number of pathological conditions. Furthermore, mtDNA release leading to activation of inflammatory responses has been linked to a variety of pathological conditions in the last decade. Thus, the identification of small molecule modulators of TFAM that have the potential to increase mtDNA copy number and decrease inflammatory signaling is of great importance. Furthermore, the authors highlight potential applications in the field of mitochondrial disease, fibrosis, and autoimmune disease.

      Weaknesses:

      The central weakness of the study is the fact that the authors propose compounds as modulators or even activators of TFAM without sufficiently proving a direct effect on TFAM itself. There are no data indicating a direct effect on TFAM activity (e.g., mtDNA transcription, replication, packaging), and it is not sufficiently ruled out that other proteins (e.g., LONP1) mediate the effect. Additionally, important information on the performed screen is not provided. Thus, the data presented is currently incomplete to support the described findings. Furthermore, the introduction and discussion are lacking key references.

    3. Reviewer #2 (Public review):

      Summary:

      The present paper aims to identify small molecules that could possibly affect mitochondrial DNA (mtDNA) stability, limiting cytosolic mtDNA abundance and activation of interferon signaling. The authors developed a high-throughput screen incorporating HiBiT technology to identify possible target compounds affecting mitochondrial transcription factor A (TFAM) content, a compound known to impact mtDNA stability. Cells were subsequently exposed to target compounds to investigate the impact on TNFα-stimulated interferon signaling, a process activated by cytosolic mtDNA abundance. Compound 2, an analog of arylsulfonamide, was highlighted as a possible mitochondrial transcription factor A (TFAM)-activator, and emphasized as a small molecule that could stabilize mtDNA and prevent stress-induced interferon signaling.

      Strengths:

      Identifying compounds that positively affect mitochondrial biology has diverse implications. The combination of high-throughput screening and assay development to connect identified compounds with cellular interferon signalling events is a strength of the current approach, and the authors should be commended for identifying compounds that broadly impact interferon signalling. The authors have incorporated diverse measurements, including TFAM content, mtDNA content, interferon signaling, and ATP content, as well as verified the necessity of TFAM in mediating the beneficial effects of the emphasized small molecule (Compound 2).

      Weaknesses:

      (1) While the identified compound clearly works through TFAM, Compound 2 was identified as an arylsulfonamide, which would be expected to affect voltage-gated sodium channels (e.g. PMID: 31316182). Alterations in cellular sodium content and membrane polarization could affect metabolism to indirectly influence mtDNA and TFAM content. It remains unclear if this compound directly or indirectly affects TFAM content, especially as the authors have utilized various cancer cell lines, which could have aberrant sodium channels.

      (2) TFAM is nuclear encoded - if this compound directly functions to 'activate TFAM', why/how would TFAM content increase independent of nuclear transcription?

      (3) While a listed strength is the incorporation of diverse readouts, this is also a weakness, as there is a lack of consistency between approaches. For instance, data is not provided to show compound 2 increases TFAM or mtDNA content following TNFα stimulation, and extrapolating between cell lines may not be appropriate. The authors are encouraged to directly report TFAM and mtDNA for target compounds 2 and 15 to support their data reported in Figure 2. Ideally, the authors would also report for compound 1 as a control.

      (4) While the authors indicate compound 11 displayed the strongest effect on ISRE activity, this appears not to be identified in Figure 1B as a compound affecting TFAM content? Can the authors identify various Compounds in Figure 1B to better highlight the relationship between compounds and TFAM content?

      (5) The authors suggest Compound 2 increases cellular ATP - but they are encouraged to normalize luminescence to cellular protein and OXPHOS content to better interpret this data. Additionally, the authors are encouraged to report cellular ATP content following TNFα stimulation/stress (the key emphasis of the present data) and test compound 11, which the authors have implicated as a more sensitive compound.

      The discussion is really a perspective, theorizing the diverse implications of small molecule activation of TFAM. The authors are encouraged to provide a balanced discussion, including a critical evaluation of their own work, including an acknowledgement that evidence is not provided that Compound 2 directly activates TFAM or decreases mtDNA cytosolic leakage.

    1. eLife Assessment

      This study presents a useful inventory of genes that are up- and down-regulated in the mouse small intestine (duodenum and ileum) during the first postnatal month; the data were collected and analyzed using solid and validated methodology and can be used as a starting point for additional validation of specific markers and for follow-up functional studies. Some aspects of the study were incomplete, with claims being only partially supported by the data, and it is suggested that additional validation be performed. The authors attempted to correlate gene expression changes with periods of high and low NEC susceptibility, but these correlations are speculative and not supported by functional follow-up studies. Discussion of gene expression changes with NEC susceptibility would be more appropriate to include in the Discussion section and to be tempered in the results section.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors aimed to clarify the transcriptional changes across murine postnatal small intestinal development (0 days to 1 month) in both the duodenum and ileum, a period that shows morphological similarity to 20-30 week old fetal humans. This is an especially critical stage in human intestinal development, as necrotizing enterocolitis (NEC) usually manifests during these stages.

      Strengths:

      The authors assessed numerous timepoints between 0 days and 1 month in the postnatal mouse duodenum and ileum using bulk RNA transcriptomics of bulk-isolated tissues. Cellular deconvolution, based on relative marker expression, was used to clarify immune cell proportions in the bulk RNA sequencing data. They confirmed some transcriptional targets found in vivo primarily in mouse via qrtPCR and immunohistochemistry, but also in human fetal tissues and isolated organoids, and are of decent quality.

      Weaknesses:

      The overall weakness of this study, as mentioned by the authors themselves, is that the bulk transcriptomic data generated for the study were isolated from non-fractionated bulk intestinal tissue. This makes it difficult to interpret much of this data regarding cellular fractions found across developmental time. It is difficult to rationalize the approach here, as even isolation protocols of epithelial-only or mesenchyme-only tissues for bulk RNA sequencing are well established. The authors address some of these concerns using cellular deconvolution for immune cell populations, which I think might be helpful if they expanded this analysis to other cell types (mesenchyme, endothelium, glia). However, I would assume that bulk isolations across developmental time are going to be influenced primarily by the bulk of tissue-type found at each time point - primarily epithelium. But this is also confirmed by the immune transcripts becoming more apparent later in their time series, as this system becomes more established during weaning. This study might also be strengthened by comparison with data that is publicly available for early fetal stage development in humans. Comparisons between the duodenum and ileum could be strengthened by what we already know from adult data, from both epithelial- and mesenchyme-isolated fractions. The rationale of using the postnatal mouse as a comparison to NEC is also a little unclear- perhaps some of the developmental processes are similar, however, the environments are completely different. For example, even in early postnatal mouse development, you would find microbial activity and milk.

    3. Reviewer #2 (Public review):

      Summary:

      This work presents a valuable resource by generating a comprehensive bulk RNA sequencing catalogue of gene expression in the mouse duodenum and ileum during the first postnatal month. The central findings of this work are based on an analysis of this dataset. Specifically, the authors characterized molecular shifts that occur as the intestine matures from an immature to an adult-like state, investigating both temporal changes and regional differences between the proximal and distal small intestine. A key objective was to identify gene expression patterns relevant to understanding the region-specific susceptibility and resistance to necrotizing enterocolitis (NEC) observed in humans during the postnatal period. They also sought to validate key findings through complementary methods and to provide comparative context with human intestinal samples. This study will provide a solid reference dataset for the community of researchers studying postnatal gastrointestinal development and diseases that arise during these stages. However, the study lacks functional validation of the interpretations.

      Strengths:

      (1) The inclusion of numerous time points (day 0 through 4 weeks) and comparative analyses throughout the first postnatal month.

      (2) Validation of key interpretations of RNA-seq data by other methods.

      (3) Linking mouse postnatal development to human premature infant development, enhancing its clinical relevance, particularly for NEC research. The inclusion of human intestinal biopsy and organoid data for comparison further strengthens this link.

      (4) The investigation covers a wide array of developmental gene categories with known significance, including epithelial differentiation markers (e.g., Vil1, Muc2, Lyz1), intestinal stem cell markers (e.g., Lgr5, Olfm4, Ascl2), mesenchymal markers (e.g., Pdgfra, Vim), Wnt signaling components (e.g., Wnt3, Wnt5a, Ctnnb1), and various immune genes (e.g., defensins, T cell, B cell, ILC, macrophage markers).

      Weaknesses:

      (1) The primary limitation is that there is no functional validation. The study primarily focuses on the interpretation of RNA expression. This is a common limitation of transcriptomic "atlas" studies, but the functional and mechanistic relevance of these interpretations remains to be determined.

      (2) The data are derived from bulk RNA-Seq of full-thickness intestinal tissue. While this approach helps capture rare cell types and both epithelial and mesenchymal components simultaneously, it does not provide cell-type-specific gene expression profiles, which might obscure important nuances. Future investigations using single-cell sequencing would be a logical follow-up.

      (3) The day 4 samples were omitted due to quality issues, which might have led to missing some dynamic changes, especially given that some ISC genes show dynamic changes around day 6.

    4. Reviewer #3 (Public review):

      Summary:

      This study uses bulk mRNA sequencing to profile transcriptional changes in intestinal cells during the early postnatal period in mice - a developmental window that has received relatively little attention despite its importance. This developmental stage is particularly significant because it parallels late gestation in humans, a time when premature infants are highly vulnerable to necrotizing enterocolitis (NEC). By sampling closely spaced timepoints from birth through postnatal week four, the authors generate a resource that helps define transcriptional trajectories during this phase. Although the primary focus is on murine tissue, the authors also present limited data from human fetal intestinal biopsy samples and organoids. In addition, they discuss potential links between observed gene expression changes and factors that may contribute to NEC.

      Strengths:

      The close temporal sampling in mice offers a detailed view of dynamic transcriptional changes across the first four weeks after birth. The authors leverage these close timepoints to perform hierarchical clustering to define relationships between developmental stages. This is a useful approach, as it highlights when transcriptional states shift most dramatically and allows for functional predictions about classes of genes that vary over time. This high-level analysis provides an effective entry point into the dataset and will be useful for future investigations. The inclusion of human fetal intestinal samples, although limited, is especially notable given the scarcity of data from late fetal timepoints. The authors are generally careful in their presentation of results, acknowledging the limitations of their approach and avoiding over-interpretation. As they note, this dataset is intended as a foundation for their lab and others, with secondary approaches required to more fully explore the biological questions raised.

      Weaknesses:

      One limitation of the study is the use of bulk mRNA sequencing to draw conclusions about individual cell types. It has been documented that a few genes are exclusively expressed in single cell types. For instance, markers such as Lgr5 and Olfm4 are enriched in intestinal stem cells (ISCs), but they are also expressed at lower levels in other lineages and in differentiating cells. Using these markers as proxies for specific cell populations lowers confidence in the conclusions, particularly without complementary validation to confirm cell type-specific dynamics.

      Validation of the sequencing data was itself limited, relying primarily on qPCR, which measures expression at the same modality rather than providing orthogonal support. It is unclear how the authors selected the subset of genes for validation; many key genes highlighted in the sequencing data were not assessed. Moreover, the regional differences reported in Lgr5, Olfm4, and Ascl2, appearing much higher in proximal samples than in distal ones, were not recapitulated by qPCR validation of Olfm4, and this discrepancy was not addressed. Resolving such inconsistencies will be important for interpreting the dataset.

      The basis for linking particular gene sets to NEC susceptibility rests largely on their spatial restriction to the distal intestine and their temporal regulation between early (day 0-14) and later (weeks 3-4) developmental stages. While this is a reasonable approach for generating hypotheses, the correlations have limited interpretive power without experimental validation, which is not provided here. Many factors beyond NEC may drive regional and temporal differences in intestinal development.

      Finally, the contribution of human fetal biopsy samples is minimal. The central figure presenting these data (Figure 4A) shows immunofluorescence for LGR5, a single stem cell marker. The staining at day 35 is not convincing, and the conclusions that can be drawn are limited to confirming the localization of LGR5-positive cells to crypts as early as 26 weeks.

    1. eLife Assessment

      This valuable study examined the roles of the posterior parietal cortex in rats performing an auditory change-detection decision task. It provided solid evidence for two subpopulations with opposing modulation patterns during decision formation and for a correspondence between neural and behavioral measures of the short timescale used for evidence evaluation.

    2. Joint Public Review:

      In this study, the authors sought to characterize the relationship between the timescales of evidence integration in an auditory change detection task and neural activity dynamics in the rat posterior parietal cortex (PPC), an area that has been implicated in the accumulation of sensory evidence. Using the state-of-the-art Neuropixel recording techniques, they identified two subpopulations of neurons whose firing rates were positively and negatively modulated by auditory clicks. The timescale of click-related response was similar to the behaviorally measured timescale for evidence evaluation. The click-related response of positively modulated neurons also depended on when the clicks were presented, which the authors hypothesized to reflect a time-dependent gain change to implement an urgency signal. Using muscimol injections to inactivate the PPC, they showed that PPC inactivation affected the rats' choices and reaction times.

      There are several strengths of this study, including:

      (1) Compelling evidence for short temporal integration in behavioral and neural data for this task.

      (2) Well-executed and interpretable comparisons of psychophysical reverse correlation with single-trial, click-triggered neuronal analyses to relate behavior and neural activity.

      (3) Inactivation experiments to test for causality.

      (4) Characterization of neural subpopulations that allows for complex relationships between a brain region and behavior.

      (5) Experimental evidence for an interesting way to use sensory gain change to implement urgency signals.

      There are also some concerns, including:

      (1) The work could be better contextualized. From a normative Bayesian perspective, the observed adaptation of timescales and gain aligns closely with optimal strategies for change detection in noisy streams: placing greater weight on recent sensory samples and lowering evidence requirements as decision urgency grows. However, the manuscript could go further in explicitly connecting the experimental findings to normative models, such as leaky accumulator or dynamic belief-updating frameworks. This would strengthen the broader impact of the work by making clear how the observed PPC dynamics instantiate computationally optimal strategies.

      (2) It is unclear how the rats are performing the task, both in terms of the quality of performance (they only show hit rates, but the rats also seem to have high false alarm rates), and in terms of the underlying strategy that they seem to be using.

      (3) A major conceptual weakness lies in the claim that PPC "dynamically modulates evidence evaluation in a time-adaptive manner to suit the behavioral demands of a free-response change detection task." To support this claim, it would require direct comparison of neural activity between two task demands, either in two tasks or in one task with manipulations that promote the adoption of different timescales.

      (4) Some analyses of neural data are lacking or seem incomplete, without considering alternative interpretations.

      (5) The muscimol inactivation results did not provide a clear interpretation about the link between PPC activity and decision performance.

    1. eLife Assessment

      This study presents valuable findings regardingg a rare mode of reproduction called hybridogenesis in a species pair of frogs. While parts of the study provide solid support for the claim of hybridogenesis, other parts are incomplete with certain claims being only partially supported, as alternative modes of reproduction cannot be fully ruled out.